I don’t use Unicode all that often but I tend to use the character picker copypasta or hex codes when I do:
var ghoti = "????" // from character picker print(ghoti) // ???? ghoti = String(UnicodeScalar(0x1F41F)!) print(ghoti) // ????
Cocoa also supports loading unicode characters by name using the \N{UNICODE CHARACTER NAME}
escape sequence. You can use patterns to construct unicode characters as in the following example:
// Make sure to escape the backslash with a second // backslash to allow proper string construction "\\N{FISH}" .applyingTransform(StringTransform.toUnicodeName, reverse: true) // ???? let constructed = "I want to eat a \\N{FISH} sandwich" .applyingTransform(StringTransform.toUnicodeName, reverse: true) print(constructed!) // "I want to eat a ???? sandwich"
This is a reverse transform, in that it converts from escaped names to the symbol it represents. The forward transform takes a string and inserts name escape sequences in place of unicode characters:
let transformed = "????????????" .applyingTransform(StringTransform.toUnicodeName, reverse: false) // \N{DOG FACE}\N{COW FACE}\N{PILE OF POO}
Unicode escapes are also usable in Cocoa regex matching. This example searches for the little blue fish in a string, printing out the results from that point:
let fishPattern = "\\N{FISH}" let regex = try! NSRegularExpression(pattern: fishPattern, options: []) let string = "I wish I had a ???? to eat" // You have to use Cocoa-style ranges. Ugh. let range = NSRange(location: 0, length: string.characters.count) // There's a fair degree of turbulence between // the Cocoa API and Swift here, especially with // the Boolean stop pointer regex.enumerateMatches(in: string, options: [], range: range) { (result, flags, stopBoolPtr) in guard let result = result else { print("Missing text checking result"); return } let substring = string.substring(from: string.index(string.startIndex, offsetBy: result.range.location)) print(substring) // "???? to eat" }
It’s hard going back from Swift’s string indexing model to Cocoa’s NSRange
system. Native regex can’t arrive soon enough.
You can also break down unicode scalars to components:
var utf16View = UnicodeScalar("????")!.utf16 print(utf16View[0], utf16View[1]) // 55357 56351 print(String(utf16View[0], radix:16), String(utf16View[1], radix: 16)) // d83d dc1f
This scalar approach goes boom when you try to push into highly composed characters:
Instead:
let utf16View = "????????????????".utf16 for c in utf16View { print(c, "\t", String(c, radix: 16)) } // 55357 d83d // 56424 dc68 // 8205 200d // 55357 d83d // 56425 dc69 // 8205 200d // 55357 d83d // 56422 dc66 // 8205 200d // 55357 d83d // 56422 dc66
It’s interesting to see the four d83d
components in there.
Got any fun little Unicode tricks? Drop a comment, a tweet, or an email and let me know.
Update:
SE-0178: Add unicodeScalars property to Character is accepted without revision. https://t.co/RLpwGkuw2c
— ericasadun (@ericasadun) May 17, 2017
One Comment
d83d is a High Surrogate code point. “d83d dc66” is the surrogate pair for U+1F466, BOY. Because utf16 is so much fun. ????