1

Let's say we have a sentence like this:

sentence="thanks for coming please visit https://www.stackoverflow.com for more or look me up on https://www.linkedin.com"

sentence=as.data.frame(sentence)

I'd like to extract the first url only

This method works when a sentence contains one url, but not when there are multiple

library(qdapRegex)

#Extract Url
sentence[["URL"]] <- unlist(rm_url(sentence[["sentence"]], extract=TRUE)) 

Any ideas would be highly appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Varun
  • 1,211
  • 1
  • 14
  • 31
  • 1
    Possible duplicate of [Regular expression to find URLs within a string](https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string) **or** [extracting first value from a list](https://stackoverflow.com/questions/20950221/extracting-first-value-from-a-list) – ctwheels Sep 07 '17 at 16:14
  • 1
    The `str_extract` function from the stringr package should work: `trimws(str_extract(sentence$sentence, "http.+? "))` Assumes the url ends with a space. – Dave2e Sep 07 '17 at 16:26

1 Answers1

0

You need to index for the first element:

#Extract Url
sentence[["URL"]] <- unlist(rm_url(sentence[["sentence"]], extract=TRUE))[1] 
Kelli-Jean
  • 1,417
  • 11
  • 17