Swift: Counting emoji groups

Update: Confirmed as bug (“Indeed.  We consider this to be a bug, not a feature, and are tracking it as rdar://20511834″) by Chris Lattner. Thanks to Joseph Lord for the heads up.

Apple’s new emojis are adding both culturally and programmatically exciting developments. Take care because your previous string counting routines (“countElements” and later “count”)  may not take composed character sequences into account. Here’s an example of where you may get tripped up by this technology.

Here’s a single emoji . When you count it, it returns one character.

Screen Shot 2015-04-16 at 9.47.32 AM

Here’s another single emoji. When you count it, it returns two characters. That’s because this is a new-style emoji composition:

Screen Shot 2015-04-16 at 9.48.55 AM

Some emoji now offer grouping and skin-tone variations, like this one does.

Screen Shot 2015-04-16 at 9.40.34 AM

If you iterate through the emoji characters, you see that the newemoji example is composed of two items that are composited together to produce the emoji-of-color.

Screen Shot 2015-04-16 at 9.50.34 AM

You see similar groupings in family-style and couple-style emojis. Here’s a family group. (I’m still trying to figure out where the blue and green shirts come into this.)

Screen Shot 2015-04-16 at 9.52.08 AM

Over at devforums, Andrew Carter posted a solution for handling composed sequences. It involves enumerating substrings using grouped character sequences. Here’s my take on that solution, creating a computed composedCount property for strings.

You end up with a count that treats composed sequences as single entities, returning a count of 1 instead of 4 for the family string shown above.

There are several notes in the standard library module about using composed characters, such as for substringWithRange (“Hint: Use with rangeOfComposedCharacterSequencesForRange: to avoid breaking up composed characters”) and (“Note that the length of the range returned by these methods might be different than the length of the target string, due composed characters and such.”).

4 Comments

  • cool~

  • Nice post! How to support these kind of characters in backend server e. g. ruby or php?

  • how to separate a string as array with emoji group?

  • i’ve tried to separate a string to an array with emoji group but didn’t success.