9

I need to validate input: valid variants are either number or empty string. What is the correspondent regular expression?

String pattern = "\d+|<what shoudl be here?>";

UPD: dont suggest "\d*" please, I'm just curious how to tell "empty string" in regexp.

Roman
  • 64,384
  • 92
  • 238
  • 332

7 Answers7

18

In this particular case, ^\d*$ would work, but generally speaking, to match pattern or an empty string, you can use:

^$|pattern

Explanation

  • ^ and $ are the beginning and end of the string anchors respectively.
  • | is used to denote alternates, e.g. this|that.

References

Related questions


Note on multiline mode

In the so-called multiline mode (Pattern.MULTILINE/(?m) in Java), the ^ and $ match the beginning and end of the line instead. The anchors for the beginning and end of the string are now \A and \Z respectively.

If you're in multiline mode, then the empty string is matched by \A\Z instead. ^$ would match an empty line within the string.


Examples

Here are some examples to illustrate the above points:

String numbers = "012345";

System.out.println(numbers.replaceAll(".", "<$0>"));
// <0><1><2><3><4><5>

System.out.println(numbers.replaceAll("^.", "<$0>"));
// <0>12345

System.out.println(numbers.replaceAll(".$", "<$0>"));
// 01234<5>

numbers = "012\n345\n678";
System.out.println(numbers.replaceAll("^.", "<$0>"));       
// <0>12
// 345
// 678

System.out.println(numbers.replaceAll("(?m)^.", "<$0>"));       
// <0>12
// <3>45
// <6>78

System.out.println(numbers.replaceAll("(?m).\\Z", "<$0>"));     
// 012
// 345
// 67<8>

Note on Java matches

In Java, matches attempts to match a pattern against the entire string.

This is true for String.matches, Pattern.matches and Matcher.matches.

This means that sometimes, anchors can be omitted for Java matches when they're otherwise necessary for other flavors and/or other Java regex methods.

Related questions

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
6
/^\d*$/

Matches 0 or more digits with nothing before or after.

Explanation:

The '^' means start of line. '$' means end of line. '*' matches 0 or more occurences. So the pattern matches an entire line with 0 or more digits.

KaptajnKold
  • 10,638
  • 10
  • 41
  • 56
3

To explicitly match the empty string, use \A\Z.

You can also often see ^$ which works fine unless the option is set to allow the ^ and $ anchors to match not only at the start or end of the string but also at the start/end of each line. If your input can never contain newlines, then of course ^$ is perfectly OK.

Some regex flavors don't support \A and \Z anchors (especially JavaScript).

If you want to allow "empty" as in "nothing or only whitespace", then go for \A\s*\Z or ^\s*$.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

Just as a funny solution, you can do:

\d+|\d{0}

A digit, zero times. Yes, it does work.

unbeli
  • 29,501
  • 5
  • 55
  • 57
0

One of the way to view at the set of regular language as the closure of the below things:

  1. Special < EMPTY_STRING > is the regular language
  2. Any symbol from alphaphet is the valid regular language
  3. Any concatentation and union of two valid regexps is the regular language
  4. Any union of two valid regular language is the regular language
  5. Any transitive closure of the regexp is the regular language

Concreate regular language is concrete element of this closure.


I didn't find empty symbol in POSIX standard to express regular language idea from step (1).

But it is exist extra thing like question mark there which is by posix definition is the following:

(regexp|< EMPTY_STRING >)

So you can do in the following manner for bash, perl, and python:

echo 9023 | grep -E "(1|90)?23"
perl -e "print 'PASS' if (qq(23) =~ /(1|90)?23/)"
python -c "import re; print bool(re.match('^(1|90)?23$', '23'))"
Konstantin Burlachenko
  • 5,233
  • 2
  • 41
  • 40
0

There shouldn't be anything wrong with just "\d+|"

umop
  • 2,122
  • 2
  • 18
  • 22
  • That's not correct. An empty regex will match the empty string but also (multiple times) in every non-empty string, namely at each position before or after a character. – Tim Pietzcker Jul 27 '10 at 10:04
  • 4
    @Tim: but if this is Java `matches`, then the pattern will be matched against the _entire_ input string anyway. That is, in Java, `"blah".matches("")` is `false`. – polygenelubricants Jul 27 '10 at 10:08
  • Correct. To clarify, wrapping in ^ and $ will make sure that you're talking about the whole string. – umop Jul 28 '10 at 08:24
0

To make any pattern that matches an entire string optional, i.e. allow a pattern match an empty string, use an optional group:

^(pattern)?$
^^       ^^^

See the regex demo

If the regex engine allows (as in Java), prefer a non-capturing group since its main purpose is to only group subpatterns, not keep the subvalues captured:

^(?:pattern)?$

The ^ will match the start of a string (or \A can be used in many flavors for this), $ will match the end of string (or \z can be used to match the very end in many flavors, and Java, too), and the (....)? will match 1 or 0 (due to the ? quantifier) sequences of the subpatterns inside parentheses.

A Java usage note: when used in matches(), the initial ^ and trailing $ can be omitted and you can use

String pattern = "(?:\d+)?";
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563