4

I am using SobiPro, a directory system for joomla and I have a field that will have values that contain alphanumerics and hyphens only, so a sample of what might be in this text field would be:

Toy Kites, Plastic Wheels, 1-Way Gizmos, Metal Spools, 3M Wire Ties

This regex would validate what they enter on the form prior to a field save.

I thought this: (\w+)(,\s*\w+)*

But clearly I am not right, and it does not account for the hyphens.. any help! thanks!

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
C Szymaszek
  • 51
  • 1
  • 1
  • 3

4 Answers4

23

Try this:

^[-\w\s]+(?:,[-\w\s]*)*$

Using ^ and $ ensures that we validate the entire value, and don't just find a match somewhere within.

The first character class, [-\w\s]+ allows one or more alphanumeric, whitespace, or dash characters. The dash should go first in the class brackets.

The second group allows zero or more repetitions with separating commas. It is wrapped in non-capturing parentheses, a small performance optimization: (?: … )*

Notes:

  • This expression allows empty entries, such as A,B,,D. If you don't want to allow this, change the second-to-last * to a +.
  • The \w shorthand allows underscores. To prevent this, replace them with A-Za-z0-9.
Jay
  • 56,361
  • 10
  • 99
  • 123
3

Use character classes.

^([0-9A-Za-z -]+)(,[0-9A-Za-z -]+)*$

Note that \w includes underscores, which is why I'm expanding it to alphanumeric ranges.

Thanks to @Jay for pointing out missing anchors.

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • 1
    @Jay - Were I you, I would've commented, "Valid answer too, but don't forget you may want to anchor your regex!" IMHO, that's much more constructive than tersely commenting with a counterexample that seems designed to discredit rather than elucidate. – Andrew Cheong Sep 11 '13 at 16:36
  • Apologies, acheong87. It has been my experience that a comment is met with an update and then I delete the comment. – Jay Sep 11 '13 at 17:10
  • @AndrewCheong what if we want to include all the special characters, do we add it manually – Akhilesh Oct 07 '20 at 15:09
  • @Akhilesh - You can add them manually, and some have to be escaped with a backslash. (To be safe you can escape all of them with a backslash.) You could also maybe do `[^\w\s]` which means "NOT a letter or digit or whitespace char" but I'm not sure if it would capture too many other symbols. – Andrew Cheong Oct 07 '20 at 15:18
  • @AndrewCheong Ok. say we have this `^[\w:=]+(?:,[\w:=]+)*$` but it matches only `:=` of the special characters. So, why can't we do something like this `^[.*?]+(?:,[.*?]+)*$` (we can allow everything instead of manually adding special characters). – Akhilesh Oct 07 '20 at 15:51
  • @Akhilesh - I think you might be misunderstanding how `[]` works. `[\w:=]` is the same as `[=:\w]` for exampls. Both mean \w OR : OR =. Your `[.*?]` means . OR * OR ?. You want `.` to behave like a wildcard like \w but it does not, because it is a wildcard and the latter is a character class. The wildcard is used outside character classes: `^.*?(?:,.*?)*$` – Andrew Cheong Oct 07 '20 at 17:07
3

Try this:

[-\w\s]+(,[-\w\s]+)*

[-\w\s] means a word character, space or hyphen.

A word character usually includes _, so you may want to replace that with A-Za-z0-9 if you want to prevent this.

[-A-Za-z0-9\s]+(,[-A-Za-z0-9\s]+)*
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • This will match invalid inputs such as `!@#$%^&*()0,` – Jay Sep 11 '13 at 16:00
  • @Jay Based on your answer, it seems you assumed a `search` is done, rather than a `match`. I assumed the latter. Yours would work for either, but would have redundant characters for a `match`. I realize some languages may only support one of them and I have no idea what SobiPro uses, so it could indeed be a `search`, although my (possibly incorrect) assumption, based on the question, is that the regex was working except for the `-`, so it uses/supports `match` as the regex didn't contain `^` or `$`. – Bernhard Barker Sep 11 '13 at 16:11
  • The OP does indicate that the purpose of the expression is input validation. In this usage, a match should only occur when the input is wholly valid. – Jay Sep 11 '13 at 16:21
  • @Jay I think you may have misunderstood my comment, or I misunderstood your second one. To elaborate - Java, for example, has a `matches` function that compares the whole string against a regex. If such a function is used, the `^` and `$` are not required. Java also has a `find` function, for which the `^` and `$` are required. – Bernhard Barker Sep 11 '13 at 16:56
  • Thanks for the clarification, Dukeling. I had not heard of a language making that distinction. – Jay Sep 11 '13 at 17:15
1

You can use a character class for this:

[\w\s-]+(,[\w\s-]*)*

I've made the character class inside the group optional in order to allow empty fields.

If your validator doesn't force the regex to always match the entire input field, you may need to anchor it by surrounding it with ^ at the start and $ at the end.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • @Jay: Depends how the validation rule is applied. Some rules are anchored to the start and end of the string automatically. But I must admit I don't know how SobiPro does it. – Tim Pietzcker Sep 11 '13 at 16:48