0

What does this regular express mean. It is in an XML schema that I am using:

([!-~]|[ ])*[!-~]([!-~]|[ ])*

-Dave

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Dave
  • 13,518
  • 7
  • 42
  • 51

5 Answers5

3

Take in parts. Here's the first part:

([!-~]|[ ])*

This means any number (*) of the characters between ! and ~ (including ! and ~; this turns out to be all of the printable ASCII characters, if you look up ! and ~ in an ASCII table) or a space.

Here's the second part:

[!-~]

This means one character between ! and ~

Here's the last part:

([!-~]|[ ])*

This means the same thing as the first part.

So this regular expression will match any string of printable ASCII characters, including spaces, provided there is at least one printable ASCII character in the string.

Dominic Cooney
  • 6,317
  • 1
  • 26
  • 38
2

The answers you've gotten seem to have missed one of the fundamentals of REs: a '-' inside square brackets isn't taken to mean a literal '-' unless it's the first or last character. Instead, the '-' defines a range. The '!' is (in ASCII, ISO 8859, etc.) character code 33 -- the first "visible" printable character. Likewise, in ASCII, the '~' is code 126, the last printable character.

Therefore, the "[!-~]" matches a single printable (ASCII) character.

For the rest, the other answers seem reasonable.

Edit: it looks like as I was writing this, some more accurate answers were posted -- my apologies if I offended anybody by implying otherwise. As I started writing this, the answers that had been posted were wrong on this point.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

[!-~] Matches any of the characters between "!" and "~" (the represented characters theoretically depend on the encoding in use)

[ ] Matches a space character

(x|y) Matches one of x or y

(x)* Matches any number of subsequent occurrences of x, (including none).

Romain
  • 12,679
  • 3
  • 41
  • 54
1

Any characters in the range of ! to ~ or spaces, followed by one character of the range ! to ~, followed by any number of that same range or spaces again. So it would appear to be the same as:

([!-~ ])*[!-~]([!-~ ])*
Stephen Cross
  • 1,003
  • 1
  • 8
  • 19
  • Or also equivalent to `([!-~]|[ ]?)+`. Note the fact [!-~] is actually a character class, and not a character set (it's all between ! and ~, and not !, ~ and -). – Romain Feb 09 '10 at 22:13
  • @Romain: No, your example matches (among other incorrect things), the empty string. – Anon. Feb 09 '10 at 22:16
  • Exact. Never mind the example, the secondary comment is still valid, though :) – Romain Feb 09 '10 at 22:17
1

The regular expression consists of:

  • ([!-~]|[ ])* start with zero or more characters of the range from ! (0x21) to ~ (0x7E) or the space character (0x20), so basically all printable characters from 0x21 to 0x7E plus the space character
  • [!-~] followed by a single printable character
  • ([!-~]|[ ])* followed by zero or more printable characters or the space character

So it basically says that the string must only contain printable characters or the space character and there must be at least one printable character.

Gumbo
  • 643,351
  • 109
  • 780
  • 844