2

I have the following function

static func replaceAtSignNotation(_ text : String) -> String {
    var source = text
    let wholePattern = "@\\[[a-z0-9-\\-]+\\]\\((\\w+)\\)"
    let typePattern = "(?<=]\\()[^)]+(?=\\))"

    if let wholeRange = source.range(of: wholePattern, options: .regularExpression) {
        if let typeRange = source.range(of: typePattern, options: .regularExpression) {
            let username = source[typeRange]
            source.replaceSubrange(wholeRange, with: "@\(username)")
        }
    } else {
        return text
    }
    return replaceAtSignNotation(source)
}

which is doing an excellent job finding patterns such as:

@[a12-3asd-32](john) 
@[b12-32d1-23](martha)

And allowing me to catch the username, but some username do contain a '-' such as:

@[c12-12d1-13](john-user-1)

But my current regex is not capturing those cases. Any idea how I can adapt my regex to captuve those cases as well?

Ignacio Oroná
  • 4,371
  • 1
  • 19
  • 25
  • 1
    Why use two patterns? Why not using a `NSRegularExpression` and use groups to separate what you want? – Larme Aug 30 '18 at 20:28

1 Answers1

2

You may change the first regex to

let wholePattern = "@\\[[a-z0-9-]+\\]\\((\\w+(?:-\\w+)*)\\)"
                                             ^^^^^^^^^^

See the regex demo.

Or, if the -s can be anywhere and can follow one another, you may also use

let wholePattern = "@\\[[a-z0-9-]+\\]\\(([\\w-]+)\\)"
                                         ^^^^^^^

See another regex demo.

Details

  • @\[ - a literal @[ substring
  • [a-z0-9-]+ - 1+ lowercase ASCII letters, digits or -
  • \]\( - a ]( substring
  • (\w+(?:-\w+)*) - Group 1:
    • \w+ - 1 or more letters, digits or _
    • (?:-\w+)* - zero or more sequences of
      • - - a hyphen
      • \w+ - 1+ word chars
  • [\w-]+ - 1 or more word or - chars
  • \) - a ) char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is there an advantage in using a non capturing group rather than just using a character set for `\w` and `-` ? – Paolo Aug 30 '18 at 20:33
  • @UnbearableLightness The example is `john-user-1`. So, I assume the `-` should be between word chars only. – Wiktor Stribiżew Aug 30 '18 at 20:34
  • But why a non-capturing group? Why not just do `\w[\w-]*`? – Abion47 Aug 30 '18 at 20:39
  • @Abion47 We do not know the exact rules, so `\w[\w-]*` might also work for OP. It won't match names starting with `-`. A non-capturing group is used to just group a pattern, not to remember it (as the value captured won't be accessed later). – Wiktor Stribiżew Aug 30 '18 at 20:41
  • But even if the usernames _could_ start with a hyphen, OP could just use `[\w-]+` (which I see you've edited into your answer). What does a non-capturing group add to the pattern? – Abion47 Aug 30 '18 at 20:45
  • @Abion47 This has been thoroughly discussed at SO, see [What is a non-capturing group? What does (?:) do?](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do) and [Are non-capturing groups redundant?](https://stackoverflow.com/questions/31500422/are-non-capturing-groups-redundant). – Wiktor Stribiżew Aug 30 '18 at 20:47
  • I know what a non-capturing group _does_ and how it differs from capturing ones. I'm asking you what specifically it is doing _here_ that justifies its use over one of the alternatives I mentioned. It's making the pattern both more complex and less readable, and there's nothing in the pattern or OP's use of the matches that would make the additional grouping useful. Heck, you yourself even removed it from the second pattern, so why use it in the first? – Abion47 Aug 30 '18 at 20:53
  • It is removed from the second pattern because there is nothing to group. See my [first comment here](https://stackoverflow.com/questions/52104882/swift-regex-function-update/52104921?noredirect=1#comment91161083_52104921), if we need to match names that start with a word char, end with a word char, and `-` can be between word chars only, then `\w+(?:-\w+)*` is a natural solution. – Wiktor Stribiżew Aug 30 '18 at 20:56
  • OP didn't mention any of those restrictions. All he said is that the pattern needs to match usernames that may or may not contain hyphens. Assuming grouping means your pattern will not match a user named `john--user--1`. Whereas the non-grouping methods `[\w-]+` or `\w[\w-]*` would match both cases. – Abion47 Aug 30 '18 at 20:59
  • That is why there are two solutions: 1) the one that seems to be the best one for this current type of strings, 2) a less restrictive in case the example OP chose is just too specific, and the actual rules are lax. – Wiktor Stribiżew Aug 30 '18 at 21:03