-1

I use a regexp to test a link :

lolspec:\/\/(spectator\.(na|euw1|eu|kr|oc1|br|la1|la2|ru|tr|pbe1)\.lol\.riotgames\.com:(80|8088)((([?&]region=(NA1|EUW1|EUN1|KR|OC1|BR1|LA1|LA2|RU|TR1|PBE1))|([?&]gameID=([0-9]+))|([?&]encKey=(.+)))){3})

to test this link :

lolspec://spectator.euw1.lol.riotgames.com:80?region=NA1&gameID=44584&encKey=fghgdsv1134+ianfcuia

but some groups are empty (#7, #8, #9)

what should I do ?

Alan Stokes
  • 18,815
  • 3
  • 45
  • 64
  • If that's valid, make the groups optional. – adamdc78 Mar 21 '16 at 20:10
  • You might be looking for a [non-capturing group](https://stackoverflow.com/q/3512471). – Siguza Mar 21 '16 at 20:10
  • 1
    Are you sure you need to capture the result of `(na|euw1|eu|kr|oc1|br|la1|la2|ru|tr|pbe1)`? You can still make it a group, but have it be non-capturing `(?:na|euw1|...)`. Nesting capture groups can quickly mess with result ordering if you're not careful. – Mr. Llama Mar 21 '16 at 20:11
  • 1
    Don't use a regex to parse a URL, use a URL parser. Depending on the language, this can be simplified dramatically and made much less error prone. – nickb Mar 21 '16 at 20:11
  • I need to capture it yes – Luca Laissue Mar 21 '16 at 20:11
  • I tried to reduce the number of groups using the "non-capture" but some groups are not captured – Luca Laissue Mar 21 '16 at 20:24
  • @LucaLaissue - What language is this in? How are you using this regex? – nickb Mar 21 '16 at 20:25
  • I'll use this in C++ – Luca Laissue Mar 21 '16 at 20:27
  • [group #2, #3 should contains something : Regex](https://regexper.com/#%5Elolspec%3A%5C%2F%5C%2F(spectator%5C.(%3F%3Ana%7Ceuw1%7Ceu%7Ckr%7Coc1%7Cbr%7Cla1%7Cla2%7Cru%7Ctr%7Cpbe1)%5C.lol%5C.riotgames%5C.com%3A(%3F%3A80%7C8088))(%3F%3A(%3F%3A(%3F%3A%5B%3F%26%5Dregion%3D(NA1%7CEUW1%7CEUN1%7CKR%7COC1%7CBR1%7CLA1%7CLA2%7CRU%7CTR1%7CPBE1))%7C(%3F%3A%5B%3F%26%5DgameID%3D(%5B0-9%5D%2B))%7C(%3F%3A%5B%3F%26%5DencKey%3D(.%2B)))%7B3%7D)%24) – Luca Laissue Mar 21 '16 at 20:32

1 Answers1

0

Probably overkill on the capture groups.

The regex you use there contains a container capture group 4 that is quantified
like this ( ... ){3}.

What that will do is overwrite the container capture buffer 3 times leaving
the last value captured within the capture group.

Moving on to the next level is a single inner group with which the outer group encapsulates, like this (( ... )){3} so thats not needed, and you get the same affect of overwritting.

Moving even deeper, are three capture groups all separated by alternations.
They follow the same rules, each one will get overwritten if they can match
again during each successive 1..3 quantified passes.

Its only that one group match in the alternation cluster.
So, if you have identical adjacent data, it could be matched by the same
alternation cluster, leaving the other cluster groups empty.

So, this is not the approach if you want to match out-of-order parameters
in a string.

The way this is done is using lookahead assertions OR if you are using
an engine that can do conditionals.

The way to do it using conditionals is like this

 (?:
      .*? 
      (?:
           ( (?(1)(?!)) abc )           # (1)
        |  ( (?(2)(?!)) def )           # (2)
        |  ( (?(3)(?!)) ghi )           # (3)
      )
 ){3}

It forces finding all of the capture group contents.
The way you are doing it is the same but without the conditionals,
and suffering the consequences as stated above.

Btw, Your regex above does not have any empty groups with that particular sample, But it has many problems.

 lolspec:\/\/
 (                             # (1 start)
      spectator\.
      ( na | euw1 | eu | kr | oc1 | br | la1 | la2 | ru | tr | pbe1 )  # (2)
      \.lol\.riotgames\.com:
      ( 80 | 8088 )                 # (3)
      (                             # (4 start)
           (                             # (5 start)
                (                             # (6 start)
                     [?&] region=
                     ( NA1 | EUW1 | EUN1 | KR | OC1 | BR1 | LA1 | LA2 | RU | TR1 | PBE1 )  # (7)
                )                             # (6 end)
             |  (                             # (8 start)
                     [?&] gameID=
                     ( [0-9]+ )                    # (9)
                )                             # (8 end)
             |  (                             # (10 start)
                     [?&] encKey=
                     ( .+ )                        # (11)
                )                             # (10 end)
           )                             # (5 end)
      ){3}                          # (4 end)
 )                             # (1 end)

Output

 **  Grp 0 -  ( pos 0 , len 97 ) 
lolspec://spectator.euw1.lol.riotgames.com:80?region=NA1&gameID=44584&encKey=fghgdsv1134+ianfcuia  
 **  Grp 1 -  ( pos 10 , len 87 ) 
spectator.euw1.lol.riotgames.com:80?region=NA1&gameID=44584&encKey=fghgdsv1134+ianfcuia  
 **  Grp 2 -  ( pos 20 , len 4 ) 
euw1  
 **  Grp 3 -  ( pos 43 , len 2 ) 
80  
 **  Grp 4 -  ( pos 69 , len 28 ) 
&encKey=fghgdsv1134+ianfcuia  
 **  Grp 5 -  ( pos 69 , len 28 ) 
&encKey=fghgdsv1134+ianfcuia  
 **  Grp 6 -  ( pos 45 , len 11 ) 
?region=NA1  
 **  Grp 7 -  ( pos 53 , len 3 ) 
NA1  
 **  Grp 8 -  ( pos 56 , len 13 ) 
&gameID=44584  
 **  Grp 9 -  ( pos 64 , len 5 ) 
44584  
 **  Grp 10 -  ( pos 69 , len 28 ) 
&encKey=fghgdsv1134+ianfcuia  
 **  Grp 11 -  ( pos 77 , len 20 ) 
fghgdsv1134+ianfcuia