11

I'm trying to validate a query string with regex. Note that I'm not trying to match out the values, but validate its syntax. I'm doing this to practice regex, so I'd appreciate help rather than "use this lib", although seing how it may have been done in a lib would help me, so show me if you've got one.

So, this would be the prerequisites:

  • It must start with a questionmark.
  • It may contain keys with or without values separated by an equals-sign, pairs separated by ampersand.

I've got pretty far, but I'm having trouble matching in regex that the equals-sign and ampersand must be in a certain order without having to repeat match groups. This is what I've got so far:

#^\?([\w\-]+((&|=)([\w\-]+)*)*)?$#

It correctly matches ?abc=123&def=345, but it also incorrectly matches for example ?abc=123=456.

I could go overkill and do something like...

/^\?([\w\-]+=?([\w\-]+)?(&[\w\-]+(=?[\w\-]*)?)*)?$/

... but I don't want to repeat the match groups which are the same anyway.

How can I tell regex that the separators between values must iterate between & and = without repeating match groups or catastrophic back tracking?

Thank you.

Edit:

I'd like to clarify that this is not meant for a real-world implementation; for that, the built-in library in your language, which is most likely available should be used. This question is asked because I want to improve my regex skills, and parsing a query string seemed like a rewarding challenge.

  • This question pops up in Google on "query string regex" search. I must note that what happens here **should not be used live** even if you are limited to regex-based solutions, since it lacks [this](https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2) point and who knows what else (I don't, but there are many pieces of specs I don't know). – Daerdemandt Feb 04 '17 at 09:04

7 Answers7

15

This seems to be what you want:

^\?([\w-]+(=[\w-]*)?(&[\w-]+(=[\w-]*)?)*)?$

See live demo

This considers each "pair" as a key followed by an optional value (which maybe blank), and has a first pair, followed by an optional & then another pair,and the whole expression (except for the leading?) is optional. Doing it this way prevents matching ?&abc=def

Also note that hyphen doesn't need escaping when last in the character class, allowing a slight simplification.

You seem to want to allow hyphens anywhere in keys or values. If keys need to be hyphen free:

^\?(\w+(=[\w-]*)?(&\w+(=[\w-]*)?)*)?$
Bohemian
  • 412,405
  • 93
  • 575
  • 722
5

You can use this regex:

^\?([^=]+=[^=]+&)+[^=]+(=[^=]+)?$

What it does is:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \?                       '?'
--------------------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  [^=]+                    any character except: '=' (1 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2 (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )?                       end of \2 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \2)
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
Amit Joki
  • 58,320
  • 7
  • 77
  • 95
  • 1
    this matches `"? = "` – Bohemian May 30 '14 at 16:50
  • @Amit Joki Where did you get that printout? – ooga May 30 '14 at 16:52
  • 2
    https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCkQFjAA&url=http%3A%2F%2Frick.measham.id.au%2Fpaste%2Fexplain.pl%3Fregex%3D%255Cd%252B(.%255Cd%252B)%253F(%255BeE%255D%255Cd%252B)%253F&ei=A7aIU4eLIdiiugSp84KgDg&usg=AFQjCNHAWE1AKnZQeZkSODfrlt9lKjm28g&sig2=q5lxzXXvb5y_-DazXkiFkw&bvm=bv.67720277,d.c2E – Amit Joki May 30 '14 at 16:54
2

I agree with Andy Lester, but a possible regex solution is

#^\?([\w-]+=[\w-]*(&[\w-]+=[\w-]*))?$#

which is very much like what you posted.

I haven't tested it and you didn't say what language you're using so it may need a little tweaking.

ooga
  • 15,423
  • 2
  • 20
  • 21
1

This might not be a job for regexes, but for existing tools in your language of choice. Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged.

In PHP, use the parse_url function.

Perl: URI module.

Ruby: URI module.

.NET: 'Uri' class

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • Thank you for your reply, but if you read my post again, you'll see `I'm doing this to practice regex, so I'd appreciate help rather than "use this lib"`. – Helge Talvik Söderström May 30 '14 at 16:39
  • Yes, I saw that, and I also know that people will find this answer anyway. I'm thinking about future users as well as you. – Andy Lester May 30 '14 at 16:39
  • Well, I interpret that as providing an answer for people who may be looking for another reply, stumbling upon this question... I'd appreciate if we stayed on subject and discussed regex, not the particular example I use for practice. – Helge Talvik Söderström May 30 '14 at 16:47
  • 1
    `parse_url()` will pretty much try to parse anything... you can't use it as validation. It even returns a result for `JSON` strings :) – Blizz Jul 09 '15 at 15:19
1

I made this.

function isValidURL(url) {
  // based off https://mathiasbynens.be/demo/url-regex. testing https://regex101.com/r/pyrDTK/2
  var pattern = /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/?)(?:(?:\?(?:(?!&|\?)(?:\S))+=(?:(?!&|\?)(?:\S))+)(?:&(?:(?!&|\?)(?:\S))+=(?:(?!&|\?)(?:\S))+)*)?$/iuS;
  return pattern.test(url);
}

Base: https://mathiasbynens.be/demo/url-regex

Testing: https://regex101.com/r/pyrDTK/4/

ethanneff
  • 3,083
  • 5
  • 23
  • 15
0

When you need to validate a very complex url, you may use this regex

`^(https|ftp|http|ftps):\/\/([a-z\d_]+\.)?(([a-zA-Z\d_]+)(\.[a-zA-Z]{2,6}))(\/[a-zA-Z\d_\%\-=\+]+)*(\?)?([a-zA-Z\d=_\+\%\-&\{\}\:]+)?`
LH7
  • 1,385
  • 2
  • 12
  • 16
0
/^\?([\w-]+(=[\w.\-:%+]*)?(&[\w-]+(=[\w.\-:%+]*)?)*)?$/

\w = [a-zA-Z0-9_]

? = '?'

above regex supports, a-z A-Z 0-9 _ . - : % + in Param Value

you can test this regex here

Harsh Patel
  • 1,032
  • 6
  • 21