Reducing to Swift sets

A friend asked me, “Is there a better way to reduce to a set than .reduce (Set<String>()) { $0.union(CollectionOfOne($1)) } ?” He was fetching results from an external source and wanting to feed them into a set.

We kicked some ideas back and forth about how this could be implemented. Would he need to query the set before fetching all the items (no) and would his data set be so large that it would be impractical to store the intermediate results into an array before creating a set (also no).

I built a suite of tests, trying out his reduce method, using normal insertion, etc.  I assumed that using a Set initializer would probably be the best approach for a pre-computed array, but it turns out that union and insertion performed better in repeated tests:

timetest("initializer") { //  0.652348856034223
    var x: Set<String> = []
    (1 ... 5_000).forEach { _ in
        x = Set(letters)
    }
}

timetest("union") { // 0.524669112986885
    var x: Set<String> = []
    (1 ... 5_000).forEach { _ in
        x = x.union(letters)
    }
}

timetest("insert") { // 0.572339564969297
    var x: Set<String> = []
    (1 ... 5_000).forEach { _ in
        x = []
        letters.forEach ({ x.insert($0) })
    }
}

timetest("reduce") { //  0.762973523989785
    (1 ... 5_000).forEach { _ in
        var x = letters.reduce(Set<String>()) {
            $0.union(CollectionOfOne($1))
        }
    }
}

That surprised me since you’d imagine that  init<Source : Sequence where Source.Iterator.Element == Element>(_ sequence: Source) and func union<S : Sequence where S.Iterator.Element == Element>(_ other: S) -> Set<Element> would have similar performance characteristics.

What didn’t surprise me was that trying to create a set on the go cost more than storing intermediate results to an array and then building a set out of them. So long as the array was reasonably limited in size (that is large enough to be non-trivial but not so large that it put a burden on application memory), an intermediate array seems to be the better approach. Set(collectedResults) outperformed insert, formUnion, and reduce/union for non-trivial result sizes.

4 Comments

  • Curious about Dictionary’s performance characteristics here as an intermediary store vs Array or Set. There may be some smart switching in the Dictionary implementation that will yield speed gains at multiple different sizes of N if you just add every item to the dictionary, then use the keys accessor to build a set.

    In addition, the .keys member returns a lazy collection, which can get around the space concerns.

    How did you generate the “letters” variable? I’m eager to try it out myself.

  • xCode 8.2.1 – run your example in a playground.
    First two run comparable to your findings, but the “insert” and “reduce” take minutes

    • Avoid time tests in playgrounds. Try running in a standalone command line app, with whole module optimization enabled.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>