Sunday, October 16, 2016

Regular Expression Capture Groups in Swift 3 on Linux

I recently decided to have a go at using Swift. I’ve dabbled with Swift before, run through the odd tutorial but this time I wanted to actually use Swift to accomplish something. I tend to internalize programming languages far better that way. And since Swift is open source and touted to be a contender for a server-side language, I wanted to try Swift on Linux.

As I got going, I found I needed to use regular expression capture groups to find some parts of a string and then return an array of those parts. In most modern programming languages, this would be a perfectly straightforward exercise. Nearly all programming languages these days include a regular expression engine of some sort in their standard libraries and a little Googling will tell you all you need to know about the peculiarities of that language’s regular expression syntax and the APIs for using it. Swift, though, is a bit different.

Swift’s own standard library — by which I mean the included objects and functions that are Swift from the ground up and are not derived from Objective-C predecessors — is actually quite small. It consists mostly of built-in types like Arrays and Strings. The rest of the standard library, including the regular expression engine, actually consists of bridges to Objective-C objects. As far as I can tell, bridging to Objective-C objects works just fine for the most part; Swift was designed, after all, to easily interoperate with Objective-C. But there are corner cases where frustration flourishes.

To get started, I found a Stack Overflow answer which proposed the following Swift 3/Xcode 8 compatible function for getting the matching substrings:

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The main idea here is to cast the Swift String into an Objective-C NSString. NSString could then give me the NSRange objects that need to be passed to NSRegularExpression. This code does indeed compile and execute correctly on m⁠a⁠c⁠O⁠S using Xcode 8.

On Linux, however, I immediately ran into problems. The first is that Swift would absolutely not allow me to use the name NSRegularExpression; I had to use the munged name RegularExpression. (The idea seems to be that if there are Swift-native equivalents — as String is a Swift-native equivalent to NSString — then you can use the NS prefix to still refer to the Objective-C version. If there is not a Swift-native equivalent — as is the case with NSRegularExpression — then you must always remove the NS prefix.)

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Once that was fixed, I got errors about missing arguments for RegularExpression and its matches method. Adding the options argument to both calls was easy though it remains curious that they would be required on Linux but not on macOS.

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The more vexing problem was that I got an error regarding the conversion of String to NSString: “cannot convert value of type ‘String’ to type ‘NSString’ in coercion”. Initially, I assumed that NSString was simply not available on Linux and I went down quite a rabbit hole of trying to get an NSRange from String so that I could avoid using NSString altogether. It turns out, however, that doing so was unnecessary. NSString is indeed available in Swift on Linux and although the type coercion mysteriously doesn’t work, instantiating an NSString manually does:

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

I’m not sure why these differences, in particular, exist between compilation on the two platforms. The Linux version of the code does compile and run on macOS with the sole exception of having to swap NSRegularExpression for RegularExpression.