Swift’s built-in split
function returns an array of subsequences, split from a sequence or collection’s elements. For example, you may have a string and want to split it into words. Here’s a simple approach that consumes whitespace, newline, and punctuation characters:
import Foundation let string = """ Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent in purus a ante semper congue posuere at lacus. Nullam faucibus sem vel sem vestibulum, a ullamcorper nunc auctor. """ extension Character { var firstScalar: UnicodeScalar { return self.unicodeScalars.first ?? UnicodeScalar(0) } } // NOTE: This code demonstrates a simple way to split a collection. // It's not a great way to actually tokenize a string, lacking both // localization and in-word punctuation handling such as // hyphens and single quotes. let targetSet = CharacterSet.whitespacesAndNewlines .union(CharacterSet.punctuationCharacters) let isWordSeparator: (Character) -> Bool = { c in targetSet.contains(c.firstScalar) } let words = string.split(whereSeparator: isWordSeparator)
But what happens when you want to split a sequence or collection without consuming values? There’s no built-in Swift solution to turn to.
A few months ago, Soroush Khanlou and I were playing around with this problem, trying to return subsequences around predicate boundaries. When the predicate failed, a new subsequence would begin. We wanted to break sequences along logical lines, for example, where a value changed or a slope trend updated from increasing to decreasing:
let x = [1, 2, 2, 3, 3, 3, 1] .sliced(where: { $0 != $1 }) // [ArraySlice([1]), ArraySlice([2, 2]), ArraySlice([3, 3, 3]), ArraySlice([1])] let z = [1, 2, 2, 1, 3, 3, 1].sliced(where: >) // [ArraySlice([1, 2, 2]), ArraySlice([1, 3, 3]), ArraySlice([1])]
We decided to work with arrays and use ArraySlice
instances. Each ArraySlice
provides a view onto a larger array:
Instead of copying over the elements of a slice to new storage, an `ArraySlice` instance presents a view onto the storage of a larger array. And because `ArraySlice` presents the same interface as `Array`, you can generally perform the same operations on a slice as you could on the original array.
Slices don’t allocate new storage so the splits are efficient. Be careful not to hold onto slices long term. Each slice holds a reference to the array it describes, so you can run into memory management issues if the slice outlives the array it points to.
For safety, reference the start and end of each slice with startIndex
and endIndex
. These values are often not zero
or the array count, the way you’d expect with actual array instances:
let array = ["a", "b", "c", "d", "e"] let slice = array[1...3] print(slice.startIndex) // 1 print(slice.endIndex) // 4
Here’s the code I eventually settled on. It extends ArraySlice
with a recursive slicing function that ends when it runs out of data or exceeds the maximum partition count. For each slice, it walks the elements until it finds a member element that fails the predicate test, at which point it recurses with a new slice. The ArraySlice
code is hidden behind an Array
entry point, which wraps the slice functionality away from public view.
Here are some examples of how the code could be used to break apart arrays. They show creating subsequences and run length encoding. You could easily adapt this to more complex data. For example, you might use keypaths to chunk data structures based on time stamps or look for inflection points for rising and falling values.
// Same value let x = [1, 2, 2, 3, 3, 3, 1] .sliced(where: { $0 != $1 }) // [ArraySlice([1]), ArraySlice([2, 2]), ArraySlice([3, 3, 3]), ArraySlice([1])] // Run-length encoding let xx = [1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 2, 1, 1] .sliced(where: !=) .map { ($0.count, $0.first!) } // [(1, 1), (2, 2), (5, 3), (2, 1), (1, 2), (2, 1)] // Strings let y = Array("aaaabbbbcccdef") .sliced(where: !=) .map({ String($0) }) // ["aaaa", "bbbb", "ccc", "d", "e", "f"] // Increasing subsequences let z = [1, 2, 2, 1, 3, 3, 1].sliced(where: >) // [ArraySlice([1, 2, 2]), ArraySlice([1, 3, 3]), ArraySlice([1])] // Decreasing subsequences let w = [1, 2, 2, 1, 3, 3, 1].sliced(where: <) // [ArraySlice([1]), ArraySlice([2, 2, 1]), ArraySlice([3, 3, 1])]
Got suggestions for improvements? Drop a note in the comments. Thanks!
One Comment
Isn’t the predicate for this function usually positive? (i.e., the inverse of this one) Like in itertools or Data.List.
I think you can make it lazy, also, and generic over collections: