41

In a regular expression, I need to know how to match one thing or another, or both (in order). But at least one of the things needs to be there.

For example, the following regular expression

/^([0-9]+|\.[0-9]+)$/

will match

234

and

.56

but not

234.56

While the following regular expression

/^([0-9]+)?(\.[0-9]+)?$/

will match all three of the strings above, but it will also match the empty string, which we do not want.

I need something that will match all three of the strings above, but not the empty string. Is there an easy way to do that?

UPDATE:

Both Andrew's and Justin's below work for the simplified example I provided, but they don't (unless I'm mistaken) work for the actual use case that I was hoping to solve, so I should probably put that in now. Here's the actual regexp I'm using:

/^\s*-?0*(?:[0-9]+|[0-9]{1,3}(?:,[0-9]{3})+)(?:\.[0-9]*)?(\s*|[A-Za-z_]*)*$/

This will match

45
45.988
45,689
34,569,098,233
567,900.90
-9
-34 banana fries
0.56 points

but it WON'T match

.56

and I need it to do this.

rharrington
  • 53
  • 1
  • 4
  • 8

5 Answers5

46

The fully general method, given regexes /^A$/ and /^B$/ is:

/^(A|B|AB)$/

i.e.

/^([0-9]+|\.[0-9]+|[0-9]+\.[0-9]+)$/

Note the others have used the structure of your example to make a simplification. Specifically, they (implicitly) factorised it, to pull out the common [0-9]* and [0-9]+ factors on the left and right.

The working for this is:

  • all the elements of the alternation end in [0-9]+, so pull that out: /^(|\.|[0-9]+\.)[0-9]+$/
  • Now we have the possibility of the empty string in the alternation, so rewrite it using ? (i.e. use the equivalence (|a|b) = (a|b)?): /^(\.|[0-9]+\.)?[0-9]+$/
  • Again, an alternation with a common suffix (\. this time): /^((|[0-9]+)\.)?[0-9]+$/
  • the pattern (|a+) is the same as a*, so, finally: /^([0-9]*\.)?[0-9]+$/
huon
  • 94,605
  • 21
  • 231
  • 225
7

Nice answer by huon (and a bit of brain-twister to follow it along to the end). For anyone looking for a quick and simple answer to the title of this question, 'In a regular expression, match one thing or another, or both', it's worth mentioning that even (A|B|AB) can be simplified to:

A|A?B

Handy if B is a bit more complex.

Now, as c0d3rman's observed, this, in itself, will never match AB. It will only match A and B. (A|B|AB has the same issue.) What I left out was the all-important context of the original question, where the start and end of the string are also being matched. Here it is, written out fully:

^(A|A?B)$

Better still, just switch the order as c0d3rman recommended, and you can use it anywhere:

A?B|A
Kal
  • 2,098
  • 24
  • 23
  • Note for those using this: be mindful of what you put first, especially if using capture groups. For example, in the string "AB", the regex `(A|A?B)` will only match the A. If you prefer to match both when both are present, do `(A?B|A)`. – c0d3rman Dec 24 '21 at 05:09
  • 1
    @c0d3rman: Good point! It will actually match A *and* B in AB, but yes, I left out some important context there. I like the change of order though—makes it useable anywhere. I've updated my answer. – Kal Mar 18 '22 at 00:30
6

Yes, you can match all of these with such an expression:

/^[0-9]*\.?[0-9]+$/

Note, it also doesn't match the empty string (your last condition).

Andrew Logvinov
  • 21,181
  • 6
  • 52
  • 54
4

Sure. You want the optional quantifier, ?.

/^(?=.)([0-9]+)?(\.[0-9]+)?$/

The above is slightly awkward-looking, but I wanted to show you your exact pattern with some ?s thrown in. In this version, (?=.) makes sure it doesn't accept an empty string, since I've made both clauses optional. A simpler version would be this:

/^\d*\.?\d+$/

This satisfies your requirements, including preventing an empty string.

Note that there are many ways to express this. Some are long and some are very terse, but they become more complex depending on what you're trying to allow/disallow.

Edit:

If you want to match this inside a larger string, I recommend splitting on and testing the results with /^\d*\.?\d+$/. Otherwise, you'll risk either matching stuff like aaa.123.456.bbb or missing matches (trust me, you will. JavaScript's lack of lookbehind support ensures that it will be possible to break any pattern I can think of).

If you know for a fact that you won't get strings like the above, you can use word breaks instead of ^$ anchors, but it will get complicated because there's no word break between . and (a space).

/(\b\d+|\B\.)?\d*\b/g

That ought to do it. It will block stuff like aaa123.456bbb, but it will allow 123, 456, or 123.456. It will allow aaa.123.456.bbb, but as I've said, you'll need two steps if you want to comprehensively handle that.

Edit 2: Your use case

If you want to allow whitespace at the beginning, negative/positive marks, and words at the end, those are actually fairly strict rules. That's a good thing. You can just add them on to the simplest pattern above:

/^\s*[-+]?\d*\.?\d+[a-z_\s]*$/i

Allowing thousands groups complicates things greatly, and I suggest you take a look at the answer I linked to. Here's the resulting pattern:

/^\s*[-+]?(\d+|\d{1,3}(,\d{3})*)?(\.\d+)?\b(\s[a-z_\s]*)?$/i

The \b ensures that the numeric part ends with a digit, and is followed by at least one whitespace.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • Thanks! Yeah, we need to use thousands because it's a bit of a back-and-forth thing between the user and the app, with both possibly setting values in the input field. The app will always display the number in the input box with the thousands separators, so if the user then re-submits it, we want it to validate. – rharrington Nov 14 '12 at 14:58
0

Maybe this helps (to give you the general idea):

(?:((?(digits).^|[A-Za-z]+)|(?<digits>\d+))){1,2}

This pattern matches characters, digits, or digits following characters, but not characters following digits. The pattern matches aa, aa11, and 11, but not 11aa, aa11aa, or the empty string. Don't be puzzled by the ".^", which means "a character followd by line start", it is intended to prevent any match at all.

Be warned that this does not work with all flavors of regex, your version of regex must support (?(named group)true|false).

Heinz Kessler
  • 1,610
  • 11
  • 24