Extracting parameter from URL in R

Question

I'd like to remove a 'destinationId' parameter from a batch of URLs.

If i have a URL like this:

https://urlaub.xxx.de/lastminute/europa/zypern-griechenland/?destinationId=45&semcid=de.ub

How would i extract the 45? (destinationId=45)

I attempted to use something like this which i cant get to work:

destinationIdParameter <- sub("[^0-9].*","",sub("*?\\destinationId=","",url))

Possible duplicate of [How to match the bundle id for android app?](https://stackoverflow.com/questions/49628728/how-to-match-the-bundle-id-for-android-app) — Lance Toth, Apr 03 '18 at 11:46
Possible duplicate of [Extract URL parameters and values in R](https://stackoverflow.com/questions/34811595/extract-url-parameters-and-values-in-r) — Munim Munna, Apr 03 '18 at 13:38

Stéphane Laurent · Accepted Answer · 2018-04-03T11:11:52.123

4

With stringr you can get it like this:

> library(stringr)
> address <- "https://urlaub.xxx.de/lastminute/europa/zypern-griechenland/?destinationId=45&semcid=de.ub"
> str_match(address, "destinationId=(.*?)&")[,2]
[1] "45"

If (like me) you're not comfortable with regular expressions, use the qdapRegex package:

> library(qdapRegex)
> address <- "https://urlaub.xxx.de/lastminute/europa/zypern-griechenland/?destinationId=45&semcid=de.ub"
> ex_between(address, "destinationId=", "&")
[[1]]
[1] "45"

edited Apr 03 '18 at 11:11

answered Apr 03 '18 at 11:06

Stéphane Laurent

75,186
15
119
225

Thanks! I really like the qdapRegex approach as regular expressions are confusing. It's not as quick to compute as gsub solution tho :( – Tim496 Apr 03 '18 at 12:52

score 1 · Answer 2 · answered Apr 03 '18 at 11:17

With base R you can extract the number in few ways. If you are certain there is always only one number in this kind of urls, you can just erase everything which is not a number by:

> url <- "https://urlaub.xxx.de/lastminute/europa/zypern-griechenland/?destinationId=45&semcid=de.ub"
> gsub("[^0-9]", "", url)
[1] "45"

Or if you want to be more safe and want the particular number which comes after "destinationId=" not any other, then you would do something like this:

destId <- regmatches(url, gregexpr("destinationId=\\d+", url)) 
gsub("[^0-9]", "", destId)

s_baldur · Answer 3 · 2018-04-03T13:13:11.407

1

If you were to extract the destinationId value from the url, then you could do:

gsub(".+destinationId=(\\d+).+", "\\1", url)

Here \\1 refers to what is within ().
.+ matches any character sequence.

edited Apr 03 '18 at 13:13

answered Apr 03 '18 at 11:27

s_baldur

29,441
4
36
69

score 1 · Answer 4 · answered Feb 03 '21 at 01:17

1

I think the best way is parameters()

library(urltools)
example_url <- "http://en.wikipedia.org/wiki/Aaron_Halfaker?debug=true"
parameters(example_url)

answered Feb 03 '21 at 01:17

stevec

41,291
27
223
311

Jan · Answer 5 · 2018-04-03T11:50:39.077

With base R, we can do:

url <- "https://urlaub.xxx.de/lastminute/europa/zypern-griechenland/?destinationId=45&semcid=de.ub"

extract <- function(url) {
  pattern <- "destinationId=\\K\\d+"
  (id <- regmatches(url, regexpr(pattern, url, perl = TRUE)))
}

print(extract(url))

Alternatively (no perl = TRUE):

vanilla_extract <- function(url) {
  pattern <- "destinationId=([^&]+)"
  (regmatches(url, regexec(pattern, url))[[1]][2])
}

Both yield

[1] "45"

Extracting parameter from URL in R

5 Answers5