93

What is the regular expression to validate a comma delimited list like this one:

12365, 45236, 458, 1, 99996332, ......
user229044
  • 232,980
  • 40
  • 330
  • 338
everLearningStudent
  • 1,065
  • 1
  • 7
  • 6

12 Answers12

130

I suggest you to do in the following way:

(\d+)(,\s*\d+)*

which would work for a list containing 1 or more elements.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Asaph
  • 159,146
  • 25
  • 197
  • 199
  • 1
    you are right, i had to strip a first character before I could use the regex, thanks all for helping out – everLearningStudent Sep 15 '09 at 17:54
  • 1
    @ondrobaco: You're probably only inspecting the first match group. The next match group will contain the rest of the list. – Asaph Dec 28 '09 at 06:12
  • 5
    the above solution won't validate an empty list. `(^$)|(^(\d+)(,\s*\d+)*$)` might work though. – Chris Dec 09 '11 at 09:33
  • 1
    @Val: The problem with [your solution](http://stackoverflow.com/a/14460209/166339) is that it will not match lists that have no commas at all, such as `"1"` or `"12345"`. These list don't contain multiple items so they have no commas. And your regex `(\d+,)*` mandates that every number is followed by a comma. – Asaph Jan 22 '13 at 18:15
  • Excuse, me. I have just realised that collapsing the lists into a single match is what you want. I will remove my answers. – Val Jan 23 '13 at 11:52
  • 4
    How would one go to match/extract each element (with a regex)? – Gustavo Puma Sep 15 '14 at 08:38
  • @vash47 That would depend on your programming language. Please post a separate question for that one. – Asaph Sep 17 '14 at 18:19
  • `([0-9]+(?:\.[0-9]+)?|0?\.[1-9]+)` for integer and decimal numbers separated by comma – Junior Mayhé Apr 12 '19 at 14:24
  • How about this one: `([0-9]+(?:\.[0-9]+)?|0?\.[0-9]+|\.[0-9])` – Junior Mayhé Apr 13 '19 at 09:50
  • @JuniorM Seems unnecessarily redundant. It can be reduced to `([0-9]+(?:\.[0-9]+)?|\.[0-9]+)`. – Asaph Apr 13 '19 at 15:12
  • The downside of this solution is that it repeats the pattern for a list item. Imagine you have a comma-separated list of emails instead. I like the approach by K.P. in that it solves not only the problem of the OP, but of other people who arrive here from searching for a 'comma-delimited list' which matches the question's title. – ob-ivan Sep 17 '20 at 15:56
  • I am not sure why this is matching only the first and last element? – Felix Farquharson Feb 23 '22 at 00:16
29

This regex extracts an element from a comma separated list, regardless of contents:

(.+?)(?:,|$)

If you just replace the comma with something else, it should work for any delimiter.

K. P.
  • 406
  • 4
  • 5
  • Does it extract more than one element? – panza May 06 '16 at 13:00
  • 2
    To deal with whitespace after the commas, as in the OP, I suggest this slight modification: `(.+?)(?:,\s*|$)` – Chad Cloman Oct 22 '19 at 13:18
  • 2
    @paranza - yes this will extract more than one element, but only if global matching is enabled, where whatever function you're using returns all matches instead of just the first one. In the old days you did this by putting a 'g' after the closing slash (e.g., `/expr/g`), but apparently it's not all that standard. In PHP, for example you have to use `preg_match_all()` instead of `preg_match()`. Other flavors of regex have other ways of doing it. – Chad Cloman Oct 22 '19 at 14:29
13

It depends a bit on your exact requirements. I'm assuming: all numbers, any length, numbers cannot have leading zeros nor contain commas or decimal points. individual numbers always separated by a comma then a space, and the last number does NOT have a comma and space after it. Any of these being wrong would simplify the solution.

([1-9][0-9]*,[ ])*[1-9][0-9]*

Here's how I built that mentally:

[0-9]  any digit.
[1-9][0-9]*  leading non-zero digit followed by any number of digits
[1-9][0-9]*, as above, followed by a comma
[1-9][0-9]*[ ]  as above, followed by a space
([1-9][0-9]*[ ])*  as above, repeated 0 or more times
([1-9][0-9]*[ ])*[1-9][0-9]*  as above, with a final number that doesn't have a comma.
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
mcherm
  • 23,999
  • 10
  • 44
  • 50
  • I found this answer really useful, just needed a little tweak in order to accept whitespaces before and after the comma `([1-9][0-9]*[ ]*,[ ]*)*[1-9][0-9]*` ... maybe somebody will find this useful – pollirrata Apr 25 '12 at 19:51
  • I like this example the best, how would I allow line breaks after this? – justinpees Aug 06 '17 at 14:33
9

Match duplicate comma-delimited items:

(?<=,|^)([^,]*)(,\1)+(?=,|$)

Reference.

This regex can be used to split the values of a comma delimitted list. List elements may be quoted, unquoted or empty. Commas inside a pair of quotation marks are not matched.

,(?!(?<=(?:^|,)\s*"(?:[^"]|""|\\")*,)(?:[^"]|""|\\")*"\s*(?:,|$))

Reference.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
madcolor
  • 8,105
  • 11
  • 51
  • 74
  • What exactly is the pipe symbol (|) doing there ? It's the one symbol not explained in the page you link to, and I can't make sense of it. – Thomas Vander Stichele Jan 25 '13 at 20:32
  • 1
    @ThomasVanderStichele: It's for alternation. `(foo|bar)` matches either `foo` or `bar`. For more information: http://www.regular-expressions.info/alternation.html – Amal Murali Jun 10 '14 at 14:30
7
/^\d+(?:, ?\d+)*$/
w35l3y
  • 8,613
  • 3
  • 39
  • 51
2

i used this for a list of items that had to be alphanumeric without underscores at the front of each item.

^(([0-9a-zA-Z][0-9a-zA-Z_]*)([,][0-9a-zA-Z][0-9a-zA-Z_]*)*)$
chown
  • 51,908
  • 16
  • 134
  • 170
PPPaul
  • 361
  • 1
  • 7
1

I had a slightly different requirement, to parse an encoded dictionary/hashtable with escaped commas, like this:

"1=This is something, 2=This is something,,with an escaped comma, 3=This is something else"

I think this is an elegant solution, with a trick that avoids a lot of regex complexity:

if (string.IsNullOrEmpty(encodedValues))
{
    return null;
}
else
{
    var retVal = new Dictionary<int, string>();
    var reFields = new Regex(@"([0-9]+)\=(([A-Za-z0-9\s]|(,,))+),");
    foreach (Match match in reFields.Matches(encodedValues + ","))
    {
        var id = match.Groups[1].Value;
        var value = match.Groups[2].Value;
        retVal[int.Parse(id)] = value.Replace(",,", ",");
    }
    return retVal;
}

I think it can be adapted to the original question with an expression like @"([0-9]+),\s?" and parse on Groups[0].

I hope it's helpful to somebody and thanks for the tips on getting it close to there, especially Asaph!

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Brian Birtle
  • 627
  • 7
  • 15
1

You might want to specify language just to be safe, but

(\d+, ?)+(\d+)?

ought to work

David Berger
  • 12,385
  • 6
  • 38
  • 51
1

In JavaScript, use split to help out, and catch any negative digits as well:

'-1,2,-3'.match(/(-?\d+)(,\s*-?\d+)*/)[0].split(',');
// ["-1", "2", "-3"]
// may need trimming if digits are space-separated
crazy4groovy
  • 125
  • 1
  • 5
0

The following will match any comma delimited word/digit/space combination

(((.)*,)*)(.)*
Aidan
  • 39
  • 6
0

Why don't you work with groups:

^(\d+(, )?)+$
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Benni
  • 11
  • 2
0

If you had a more complicated regex, i.e: for valid urls rather than just numbers. You could do the following where you loop through each element and test each of them individually against your regex:

const validRelativeUrlRegex = /^(^$|(?!.*(\W\W))\/[a-zA-Z0-9\/-]+[^\W_]$)/;

const relativeUrls = "/url1,/url-2,url3";

const startsWithComma = relativeUrls.startsWith(",");
const endsWithComma = relativeUrls.endsWith(",");

const areAllURLsValid = relativeUrls
  .split(",")
  .every(url => validRelativeUrlRegex.test(url));

const isValid = areAllURLsValid && !endsWithComma && !startsWithComma