Regular Expression Capture Groups in Swift 3 on Linux

I re­cently de­cided to have a go at us­ing Swift. I’ve dabbled with Swift be­fore, run through the odd tu­tori­al but this time I wanted to ac­tu­ally use Swift to ac­com­plish something. I tend to in­tern­al­ize pro­gram­ming lan­guages far bet­ter that way. And since Swift is open source and touted to be a con­tender for a serv­er-side lan­guage, I wanted to try Swift on Linux.

As I got go­ing, I found I needed to use reg­u­lar ex­pres­sion cap­ture groups to find some parts of a string and then re­turn an ar­ray of those parts. In most mod­ern pro­gram­ming lan­guages, this would be a per­fectly straight­for­ward ex­er­cise. Nearly all pro­gram­ming lan­guages these days in­clude a reg­u­lar ex­pres­sion en­gine of some sort in their stand­ard lib­rar­ies and a little Googling will tell you all you need to know about the pe­cu­li­ar­it­ies of that lan­guage’s reg­u­lar ex­pres­sion syn­tax and the APIs for us­ing it. Swift, though, is a bit dif­fer­ent.

Swift’s own stand­ard lib­rary — by which I mean the in­cluded ob­jects and func­tions that are Swift from the ground up and are not de­rived from Ob­ject­ive-C pre­de­cessors — is ac­tu­ally quite small. It con­sists mostly of built-in types like Ar­rays and Strings. The rest of the stand­ard lib­rary, in­clud­ing the reg­u­lar ex­pres­sion en­gine, ac­tu­ally con­sists of bridges to Ob­ject­ive-C ob­jects. As far as I can tell, bridging to Ob­ject­ive-C ob­jects works just fine for the most part; Swift was de­signed, after all, to eas­ily in­ter­op­er­ate with Ob­ject­ive-C. But there are corner cases where frus­tra­tion flour­ishes.

To get star­ted, I found a Stack Over­flow an­swer which pro­posed the fol­low­ing Swift 3/Xcode 8 com­pat­ible func­tion for get­ting the match­ing sub­strings:

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The main idea here is to cast the Swift String in­to an Ob­ject­ive-C NSString. NSString could then give me the NSRange ob­jects that need to be passed to NSRegularExpression. This code does in­deed com­pile and ex­ecute cor­rectly on m⁠a⁠c⁠O⁠S us­ing Xcode 8.

On Linux, however, I im­me­di­ately ran in­to prob­lems. The first is that Swift would ab­so­lutely not al­low me to use the name NSRegularExpression; I had to use the munged name RegularExpression. (The idea seems to be that if there are Swift-nat­ive equi­val­ents — as String is a Swift-nat­ive equi­val­ent to NSString — then you can use the NS pre­fix to still refer to the Ob­ject­ive-C ver­sion. If there is not a Swift-nat­ive equi­val­ent — as is the case with NSRegularExpression — then you must al­ways re­move the NS pre­fix.)

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Once that was fixed, I got er­rors about miss­ing ar­gu­ments for RegularExpression and its matches meth­od. Adding the options ar­gu­ment to both calls was easy though it re­mains curi­ous that they would be re­quired on Linux but not on ma­cOS.

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The more vex­ing prob­lem was that I got an er­ror re­gard­ing the con­ver­sion of String to NSString: “cannot convert value of type 'String' to type 'NSString' in coercion”. Ini­tially, I as­sumed that NSString was simply not avail­able on Linux and I went down quite a rab­bit hole of try­ing to get an NSRange from String so that I could avoid us­ing NSString al­to­geth­er. It turns out, however, that do­ing so was un­ne­ces­sary. NSString is in­deed avail­able in Swift on Linux and al­though the type co­er­cion mys­ter­i­ously doesn’t work, in­stan­ti­at­ing an NSString manu­ally does:

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

I’m not sure why these dif­fer­ences, in par­tic­u­lar, ex­ist between com­pil­a­tion on the two plat­forms. The Linux ver­sion of the code does com­pile and run on ma­cOS with the sole ex­cep­tion of hav­ing to swap NSRegularExpression for RegularExpression.