A friend asked me, “Is there a better way to reduce to a set than .reduce (Set<String>()) { $0.union(CollectionOfOne($1)) }
?” He was fetching results from an external source and wanting to feed them into a set.
We kicked some ideas back and forth about how this could be implemented. Would he need to query the set before fetching all the items (no) and would his data set be so large that it would be impractical to store the intermediate results into an array before creating a set (also no).
I built a suite of tests, trying out his reduce method, using normal insertion, etc. I assumed that using a Set
initializer would probably be the best approach for a pre-computed array, but it turns out that union
and insertion
performed better in repeated tests:
timetest("initializer") { // 0.652348856034223 var x: Set<String> = [] (1 ... 5_000).forEach { _ in x = Set(letters) } } timetest("union") { // 0.524669112986885 var x: Set<String> = [] (1 ... 5_000).forEach { _ in x = x.union(letters) } } timetest("insert") { // 0.572339564969297 var x: Set<String> = [] (1 ... 5_000).forEach { _ in x = [] letters.forEach ({ x.insert($0) }) } } timetest("reduce") { // 0.762973523989785 (1 ... 5_000).forEach { _ in var x = letters.reduce(Set<String>()) { $0.union(CollectionOfOne($1)) } } }
That surprised me since you’d imagine that init<Source : Sequence where Source.Iterator.Element == Element>(_ sequence: Source)
and func union<S : Sequence where S.Iterator.Element == Element>(_ other: S) -> Set<Element>
would have similar performance characteristics.
What didn’t surprise me was that trying to create a set on the go cost more than storing intermediate results to an array and then building a set out of them. So long as the array was reasonably limited in size (that is large enough to be non-trivial but not so large that it put a burden on application memory), an intermediate array seems to be the better approach. Set(collectedResults)
outperformed insert
, formUnion
, and reduce/union
for non-trivial result sizes.
4 Comments
Curious about Dictionary’s performance characteristics here as an intermediary store vs Array or Set. There may be some smart switching in the Dictionary implementation that will yield speed gains at multiple different sizes of N if you just add every item to the dictionary, then use the keys accessor to build a set.
In addition, the .keys member returns a lazy collection, which can get around the space concerns.
How did you generate the “letters” variable? I’m eager to try it out myself.
My working notes are here: https://gist.github.com/erica/ba8d60ec26c5459b9d3b4330da926c07
I tested using a macOS command line utility. Make sure you enable whole module optimization, which is not enabled by default.
xCode 8.2.1 – run your example in a playground.
First two run comparable to your findings, but the “insert” and “reduce” take minutes
Avoid time tests in playgrounds. Try running in a standalone command line app, with whole module optimization enabled.