A while back, I posted about counting strings in Swift. I discovered that countElements() offered the most consistently correct results regardless of string content. Time has passed. With Swift 1.2, countElements() has been replaced with count(), which of course has made updating code and libraries a great deal of fun because there’s nothing more exciting than a language past its 1.0 debut that continues to change syntax details.
Last night, I ran into an interesting string counting situation with countElements()/count()/countChocula() even as I doggedly insisted on its superior correctness in string testing.
Ken Ferry pointed out that the count call requires O(N) time. In comparison, utf16count is O(1). As my previous experiments showed, when you’re using complex strings with emoji, the answer with utf16count will be wrong but as Ken correctly noted, you’ll get the wrong answer really fast. For a 33554432-character string I threw together, that answer took 8.42 seconds with count and 3.6e-05 with utf16count. Pretty big performance difference.
So why do O(1) results matter? As Ken pointed out, “Wrong depends on why you’re looking at it.” There’s no instance in which the utf16count will be wrong for zero length checks, such as if myString.utf16Count == 0. So if you’re going to perform a lot of those tests, it starts to matter. For a zero length string of “”, 100000 comparisons on my Mac mini took 0.015 seconds with utf16Count versus 0.04 for count(), a significant slowdown.
Turns out, however, that there’s an even better solution, which is to check whether the string is equal to “”. Nicolás Alvarez suggested this approach and I put it to the test, again running each comparison 100000 times. Results? Just 0.005 seconds to perform the == “” tests compared to 0.015 for the utf16Count ones.
Update: Davide De Franceschi asks, what about using .isEmpty. Answer: Not too shabby:
0.00494605302810669 // == "" 0.0156670212745667 // utf16count 0.0351709723472595 // count() 0.00633901357650757 // isEmpty
Update Alex Pretzlav adds, Your string counting question reminded me I had written this little guy:
func isEmptyOrNil<C : CollectionType>(x: C?) -> Bool { return x.map { isEmpty($0) } ?? true }
3 Comments
`String.utf16Count` only exists when your code needs compatible with obj-c. `NSString.count` returns the number of UTF-16 code units in the NSString, which is what `String.utf16Count` returns. If you need to know if a String is empty, using the `.isEmpty` property (or the `isEmpty()` function) is the best way, although comparing `== “”` is also acceptable.
Swift 1.2ß2 actually removes the `.utf16Count` property entirely, in favor of `count(str.utf16)`. Probably because pure-Swift code shouldn’t be using `.utf16Count` at all and it was potentially confusing to people as to how they should be counting their strings. Also note that `.utf16Count` couldn’t possibly be O(1) unless the string is encoded as UTF-16. It’s very plausible that calling `.utf16Count` converted the string’s backing storage into an `NSString` and called `count` on that, though I haven’t done any tests. In that case, the very first invocation would be potentially expensive and subsequent invocations would be O(1). But since the property is gone now, it’s a bit moot.
It’s plausible that `count(str.utf16)` is O(1) for NSString-backed Strings. While `count()` is documented as being O(N) for any collection that doesn’t use `RandomAccessIndexType`, there’s still a potential for “hidden” optimized paths. In fact, in a quick test, the optimized SIL generated for `count(s.utf16)` appears to be a O(1) lookup of an internal count in the string.
[…] utf16Count, recently discussed. Instead “use count on the UTF16 view of the String”, […]
Interesting stuff but dangerously close to “premature optimisation”. If you’re worried about the 0.015 seconds agains 0.04 seconds to find the (zero) length of 100,000 Strings, then the code should be being writing using such a high level language as Swift.