-2

I would like to validate a URL path to make sure it doesn't include things like back-to-back ?'s, &'s, ='s or -'s. The path should only include a-z, A-Z, 0-9, ?, -, &, and =.

For for example, these should pass:

item/643fe4ac-e87d-4b71-8fd1-522154f933c2/okay
person/adam?height=23&favcolor=blue
city/building/916fe4ac-e87d-4b71-8fd1-522154f933r5

While these should fail:

item/643fe4ac--e87d-4b71-8fd1---522154f933c2/okay
person/adam??height=23&favcolor=blue
city/@/916fe4ac-e87d-4b71-8fd1-522154f933r5

Solutions I've looked at online don't seem to work when I try them out on https://regexr.com/ (for example, this) or they are built for a non-dynamic url path or for specific situations (i.e. this or this).

I've tried building one from scratch, but I'm very inexperienced with Regex, so I managed to get a starting point of [a-zA-Z0-9/]* which basically matches anything except spaces, but needs A LOT of work to get to what I want.

Attila
  • 1,097
  • 2
  • 19
  • 45

1 Answers1

1

Okay, so you are using a character class. You can use a character class that allows your non-allowed dup characters:

[-\/?&]

then capture that and use the back reference, in this way you will know a non-allowed dup character occurred.

([-\/?&])\1

You then can add a negated character class to see if a non-allowed character is present:

[^A-Za-z0-9\/&=?-]

These two expressions can be put together:

(?:([-\/?&])\1|[^A-Za-z0-9\/&=?-])

and then if you have a match the string is invalid.

https://regex101.com/r/zPIObe/3/

chris85
  • 23,846
  • 7
  • 34
  • 51