0

I have a string in the format of an url query :

string <- "key1=value1&key2=value2"

And I would like to extract all the parameters names (key1, key2).

I thought about strsplit with a split matching everything between = and an optional &.

unlist(strsplit(string, "=.+&?"))
[1] "key1"

But I guess that this pattern matches from the first = to the end of the string including my optional & in the .+. I suspect this is because of the "greediness" of the regexp so I tried it to make lazy but I got a strange result.

> unlist(strsplit(string, "=.+?&?"))
[1] "key1"       "alue1&key2" "alue2" 

Now I don't really understand what is happening here and I don't know how I can make it lazy when the last matching character is optional.

I know (and I think I also understand why) that it works if I excludes & from .+ but I wish I could understand why the regexp above aren't working.

> unlist(strsplit(string, "=[^&]+&?"))
[1] "key1" "key2"

My actual option is to do it in 2 times with :

unlist(sapply(unlist(strsplit(string, "&")), strsplit, split = "=.*", USE.NAMES = FALSE))

What I'm doing wrong to achieve this in one regexp ? Thanks for any help.

I'm painfully learning regexp, so any other options would be also appreciated for my knowledge !

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
  • split's argument is supposed to describe a delimiter, not a format of the parts you wish to obtain – Aaron Feb 23 '17 at 11:54
  • 1
    If you want to extract url parameters, you may want to have a look at the `urltools` package. It may have what you need. If your goal is to learn regexp instead, by all means keep learning – GGamba Feb 23 '17 at 11:57
  • Relevant/Possible duplicated of http://stackoverflow.com/questions/4350440/split-a-column-of-a-data-frame-to-multiple-columns – zx8754 Feb 23 '17 at 11:58
  • `regmatches("key1=value1&key2=value2", gregexpr("([a-zA-Z0-9]+)(?=\\=.+&?)", "key1=value1&key2=value2", perl = TRUE))[[1]]` seems to work on your example string. – nrussell Feb 23 '17 at 12:25
  • 5
    Although `shiny::parseQueryString("key1=value1&key2=value2")` seems like the simplest solution. – nrussell Feb 23 '17 at 12:27
  • 1
    @nrussell I think this is worth adding as an answer. – zx8754 Feb 23 '17 at 12:48
  • 1
    Another option, `strsplit(string, '=|&')[[1]][c(TRUE, FALSE)]` – Sotos Feb 23 '17 at 13:00

2 Answers2

0

Your first expression doesn't work because by default, quantifiers are greedy. That's why .+ is going to match as much as possible. Why the &? is not matching anything will be explained in the next section.

What's up with the second and more confusing expression?

Let's take a look what you are doing.

unlist(strsplit(string, "=.+?&?")) [1] "key1" "alue1&key2" "alue2"

You are splitting =v but why? Because you try to make it lazy, but what does that mean?

? Makes the preceding quantifier lazy, causing it to match as few characters as possible.

The least amount of character your regex matches are:

= (hard character)

.+? (one or more of any character)

fewest possible matches here is one character which results in v

&? (if this character exists then match it too)

since the previous expression only matches one character, the character after v is not a & which causes this regex to fail

Akoya
  • 1,060
  • 12
  • 17
0

For this purpose (url parsing) the best approach seems to be shiny::parseQueryString as @nrussell suggested

shiny::parseQueryString("key1=value1&key2=value2")

enter image description here

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69