95

Specifically when does ^ mean "match start" and when does it mean "not the following" in regular expressions?

From the Wikipedia article and other references, I've concluded it means the former at the start and the latter when used with brackets, but how does the program handle the case where the caret is at the start and at a bracket? What does, say, ^[b-d]t$ match?

dokgu
  • 4,957
  • 3
  • 39
  • 77
Sylvester V Lowell
  • 1,243
  • 1
  • 12
  • 13

2 Answers2

206

^ only means "not the following" when inside and at the start of [], so [^...].

When it's inside [] but not at the start, it means the actual ^ character.

When it's escaped (\^), it also means the actual ^ character.

In all other cases it means start of the string or line (which one is language or setting dependent).

So in short:

  • [^abc] -> not a, b or c
  • [ab^cd] -> a, b, ^ (character), c or d
  • \^ -> a ^ character
  • Anywhere else -> start of string or line.

So ^[b-d]t$ means:

  • Start of line
  • b/c/d character
  • t character
  • End of line
Mikhail2048
  • 1,715
  • 1
  • 9
  • 26
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • `When it's inside [] but not at the start, it means the actual ^ character.` different possibility in Java. –  Nov 17 '19 at 16:47
  • `In all other cases it means start of the string / line (which one is language / setting dependent).` It's not really dependent, the meaning is specific to a regex engine, and their all the same on this mostly. –  Nov 17 '19 at 16:48
  • 2
    ```[^\^]``` not carat! – K0D4 Dec 09 '21 at 21:29
  • What about using a CARAT in PHP regular expressions to indicate that the expression reaches the end? – limakid Jun 23 '22 at 00:41
  • @K0D4 : you want "not caret" just `[^^]` and skip the backslash altogether – RARE Kpop Manifesto Aug 01 '23 at 16:50
0

Going to ignore block comments ? Ok, this ^\s* might be bad because \s can span lines. See if Dot-net supports horizontal whitespace \h if not [^\S\r\n] works also. Can use multi-line inline modifier (?m) (or RegexOptions.Multiline). That changes the meaning of ^ to mean the beginning of line as opposed to beginning of string (the default). So, it ends up being (?m)^\h*(#). The capture group should tell the position. If not, this is just as well (?m)(?<=^\h*)# and the position of the match is the offset.

See this for complete regex info https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Note that ^\s* will work of course, but it matches a lot of unnecessary cruft that can span lines.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563