233

Hyphen is a special character in regex, for instance, to select a range, I could do something like:

[0-9A-F]

But outside of square brackets it's just a regular character right? I've tested this on a couple of online regex testers, and hyphens seem to function as a normal character outside of square brackets (or even inside of square brackets if it's not in-between two characters - eg [-g] seems to match - or g) whether it's escaped or not. I couldn't find the answer to this, but I'm wondering whether or not it is conventional to escape hyphens.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
JSideris
  • 5,101
  • 3
  • 32
  • 50
  • 5
    It depends on which language you use to represent irregular expressions. – zzzzBov Mar 06 '12 at 17:47
  • 3
    here's a similar post that should answer your questions: http://stackoverflow.com/a/4068725/56829 – Omer Bokhari Mar 06 '12 at 17:47
  • 11
    I don't see how this is an exact duplicate. That question is asking HOW to escape hyphens. I already know how to escape them and and asking WHETHER escaping them is necessary. The fact that some of the answers overlap is irrelevant because the nature of the questions are different. Please re-open. – JSideris Mar 09 '12 at 06:23
  • 5
    The supposed dupe refers to a specific language only and so do some of its answers. – jwg Feb 17 '14 at 11:59
  • In Unix grep program you have to escape `-` always, no matter where it stands in a pattern. – bloody Mar 22 '20 at 11:35

3 Answers3

351

Correct on all fronts. Outside of a character class (that's what the "square brackets" are called) the hyphen has no special meaning, and within a character class, you can place a hyphen as the first or last character in the range (e.g. [-a-z] or [0-9-]), OR escape it (e.g. [a-z\-0-9]) in order to add "hyphen" to your class.

It's more common to find a hyphen placed first or last within a character class, but by no means will you be lynched by hordes of furious neckbeards for choosing to escape it instead.

(Actually... my experience has been that a lot of regex is employed by folks who don't fully grok the syntax. In these cases, you'll typically see everything escaped (e.g. [a-z\%\$\#\@\!\-\_]) simply because the engineer doesn't know what's "special" and what's not... so they "play it safe" and obfuscate the expression with loads of excessive backslashes. You'll be doing yourself, your contemporaries, and your posterity a huge favor by taking the time to really understand regex syntax before using it.)

Great question!

djb
  • 4,930
  • 1
  • 34
  • 37
Chris Tonkinson
  • 13,823
  • 14
  • 58
  • 90
  • 5
    Interesting point about excessive escaping by those who don't fully understand and want to "play it safe" – user Sep 10 '14 at 15:02
  • 2
    A very useful answer. Turns out that in Eclipse Luna, the Java Linter will complain if you try to escape it. – Keab42 Nov 19 '14 at 12:51
  • 7
    I think someone could argue that `obfuscate the expression with loads of excessive backslashes` might actually be backwards. I think *most* people who use regex don't fully grok the syntax. In that case the excessive backslashes might make it more clear for most people. That's not to say it's the right way to do things, but there's a least an argument that could be made for that position. – Shiania White Jun 01 '18 at 18:37
  • @ShianiaWhite I'm not sure I agree. I think I understand your point, but take the following expression as a trivialized counterpoint: `var x = 4 * 4 + 1` vs `var x = (((4) * (4)) + (1))`. In both cases, I'm setting `x = 17` but in the second version I added extra parentheses around everything just to make it more clear - except that it does exactly the opposite. – Chris Tonkinson Jun 04 '18 at 13:20
  • 2
    @ChrisTonkinson: In that case it doesn't, of course. But what I was assuming is missing knowledge by the reader that the redundancy explains. So taking your example, in `var x = (4 * 4) + 1` there are redundant parentheses. But if the reader doesn't know order of operations, then these parentheses *do* make it more clear. My point was not that any redundancy makes things more clear, but redundancy *can* make things more clear in cases where the reader doesn't know. – Shiania White Jun 04 '18 at 22:19
  • 1
    Yes, I think I do see your point, however I'm forced to disagree again on your conclusion, if only on the basis that my original assertion was a plea to "[take] the time to really understand regex syntax before using it," and the observation that all too often, people fail to do exactly that. – Chris Tonkinson Jun 13 '18 at 20:37
  • Warning! The escape it (e.g. `[a-z\-0-9]`) doesn't work in Oracle (11g) . It does not match the hyphen. – m_and_m Feb 13 '19 at 12:30
  • In Unix `grep` program you have to escape '`-`' always to get the literal, no matter where it stands in a pattern. – bloody Mar 22 '20 at 11:36
20

Outside of character classes, it is conventional not to escape hyphens. If I saw an escaped hyphen outside of a character class, that would suggest to me that it was written by someone who was not very comfortable with regexes.

Inside character classes, I don't think one way is conventional over the other; in my experience, it usually seems to be to put either first or last, as in [-._:] or [._:-], to avoid the backslash; but I've also often seen it escaped instead, as in [._\-:], and I wouldn't call that unconventional.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Bash does not process regex patterns completely or it is in some still old way does not mean someone not comfortable with regex. just the thing is that is the bash way.. that is it.. –  Jan 31 '23 at 06:58
12

Typically you would always put the hyphen first in the [] match section. EG, to match any alphanumeric character including hyphens (written the long way), you would use [-a-zA-Z0-9]

Aage
  • 5,932
  • 2
  • 32
  • 57
Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69