0

I can't figure out what does this regex match:

A: "\\/\\/c\\/(\\d*)"

B: "\\/\\/(\\d*)"

I suppose they are matching some kind of number sequence since \d matches any digit but I'd like to know an example of a string that would be a match for this regex.

The pattern syntax is that specified by ICU. Expressions are created with NSRegularExpression in an iOS app and are correct.

Rafał Sroka
  • 39,540
  • 23
  • 113
  • 143
  • What does "ICU" refer to? – aliteralmind Mar 03 '14 at 21:06
  • @aliteralmind Probably http://userguide.icu-project.org/strings/regexp – user49740 Mar 03 '14 at 21:07
  • You could always learn regex and figure it out – Gusdor Mar 03 '14 at 21:07
  • The first one matches `\/\/c\/` followed by (`\​` and then zero or more instances of `d`), as a group. The second is the same except it matches `\/\/` instead of `\/\/c\/`. – Blorgbeard Mar 03 '14 at 21:08
  • Add a language and regex tag (Java, PCRE, etc.) and an explanation of what you're referring to with ICU. Without them, your post is very difficult to answer. – Ken White Mar 03 '14 at 21:33
  • Sorry mates, I updated my answer and provided more details. – Rafał Sroka Mar 03 '14 at 21:40
  • `"\\/\\/c\\/(\\d*)"` matches double quote + esc + fwdslash + esc + fwdslash + c + esc + fwdslash + esc + many 'd's + double quote. –  Mar 03 '14 at 21:47
  • Does this answer your question? [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – miken32 Oct 19 '22 at 14:47

4 Answers4

4

The first matches //c/ + 0 or more digits. The second matches // + 0 or more digits. In both the digits are captured.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
2
  • An example of a match for A) is //c/123
  • An example of a match for B) is //12345
Erik Duymelinck
  • 701
  • 6
  • 8
1

This regex matches an odd sequence of characters, which, at first glance, almost seem like a regex, since \d is a digit, and followed by an asterisk (\d*) would mean zero-or-more digits. But it's not a digit, because the escape-slash is escaped.

\\/\\/c\\/(\\d*)

So, for instance, this one matches the following text:

\/\/c\/\
\/\/c\/\d
\/\/c\/\dd
\/\/c\/\ddd
\/\/c\/\dddd
\/\/c\/\ddddd
\/\/c\/\dddddd
...    

This one is almost the same

\\/\\/(\\d*)

except you just delete the c\/ from the above results:

\/\/\
\/\/\d
\/\/\dd
\/\/\ddd
\/\/\dddd
\/\/\ddddd
\/\/\dddddd
...

In both cases, the final \ and optional d is [capture group][1] one.

My first impression was that these regexes were intended for escaping in Java strings, meaning they would be completely invalid. If the were escaped for Java strings, such as

Pattern p = Pattern.compile("\\/\\/c\\/(\\d*)");

It would be invalid, because after un-escaping, it would result in this invalid regex:

\/\/c\/(\d*)

The single escape-slashes (\) are invalid. But the \d is valid, as it would mean any digit.

But again, I don't think they're invalid, and they're not escaped for a Java string. They're just odd.

miken32
  • 42,008
  • 16
  • 111
  • 154
aliteralmind
  • 19,847
  • 17
  • 77
  • 108
  • I think the double `\\` are actually an escape char and a backslash, like you'd use in C or C#. (You have what I read literally at first, until seeing that the matched text made no sense. I think Erik and John have it. – Ken White Mar 03 '14 at 21:18
  • You're saying this is the C or C# equivalent of `Pattern p = Pattern.compile("\\/\\/c\\/(\\d*)");`? – aliteralmind Mar 03 '14 at 21:28
  • I'm pretty sure that's the case. Using the regex tree generated by RegexBuddy, removing one of the doubled \ in each case makes the regex matches much more likely. (Not downvoting, because the user hasn't done anything to clarify (like add the language being used).) – Ken White Mar 03 '14 at 21:32
  • 1
    So this is "wrong", but I think it should be left here (along with these corrective comments), as other non-C/C# people may find it instructive. *Don't down-vote it and make me delete it, hear?!* (that's to everyone but @KenWhite :) – aliteralmind Mar 03 '14 at 21:34
  • Even in C/C++ or Java, or any double quoted rules language, the language gets first shot `\\/\\/c\\/(\\d*)`, presents `\/\/c\/(\d*)` to regex engine, which see's it as `//c/(\d*)` –  Mar 03 '14 at 21:53
  • But before knowing this C/C# convention, I would have said `\\/` is just escaping a character that doesn't need to be escaped. And therefore the escape-character itself is left hanging--and therefore is illegal. I'm guessing I'm wrong on that, too, but that was my thinking. – aliteralmind Mar 03 '14 at 22:08
  • Not an error. When it is presented to the engine (already in a string var) any non-meta context __escaped character__ is a literal, the escape stripped. Thus is the case for blindly escaping strings for injection into regex's. –  Mar 03 '14 at 22:42
  • "Thus is the case for blindly escaping strings for injection into regex's". What does that mean? – aliteralmind Mar 03 '14 at 22:50
  • 1
    str1 = `"Find '{44}' <- this literal"` ; $str_regex = `"\\("` + str1 + `"\\)"` ; –  Mar 03 '14 at 23:01
1

When I use Cygwin which emulates Bash on Windows, I sometimes run into situations where I have to escape my escape characters which is what I think is making this expression look so weird. For instance, when I use sed to look for a single '\' I sometimes have to write it as '\\\\'. (Funny, StackOverflow proved my point. If you write 4 backslashes in the comment, it only shows two. So if you process it again, they might all disappear depending on your situation).

Considering this, it might be helpful to think of pairs of backslashes as representing only one if you're coming from a similar situation. My guess would be you are. Because of this I would say Erik Duymelinck is probably spot on. This will capture a sequence of digits that may or may not follow a couple slashes and a c:

//c/000

//00000

Andrew
  • 305
  • 2
  • 10