0

I wrote a function to extract links from a string. It works fine when I pass the dataframe as an argument, but not when I want to pass the columnname string as a second argument.

The working function with one argument:

library(tidyr)     
extractLinks <- function(x) {

  # get all links in new column "url"
  df <- tidyr::extract(x, string, "url", "(http.*)")

  #get clean links and domains
  df <- tidyr::extract(df, url, c("site", "domain"), "(http.*\\.(com|co.uk|net))", remove = F)

  return(df)
}

extractLinks(df, string)

Now I want to add the second argument, but it returns an error:

Error in names(l) <- enc2utf8(into) : 
  'names' attribute [1] must be the same length as the vector [0] 

This is my function with two arguments:

extractLinks <- function(x, y) {

  # get all links in new column "url"
  df <- tidyr::extract(x, y, "url", "(http.*)")

  #get clean links and domains
  df <- tidyr::extract(df, url, c("site", "domain"), "(http.*\\.(de|com|at|ch|ly|co.uk|net))", remove = F)
  return(df)
}

extractLinks(df, string)

For replication, an example dataframe:

string
my text in front of the link http://www.domain.com
my text in front of the link http://www.domain.com
my text in front of the link http://www.domain.com

Any idea what's wrong?

kabr
  • 1,249
  • 1
  • 12
  • 22
  • So `df` is a data.frame with a column named `string`? It helps to put data into a [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make that more clear. Looks like you want your funtion to perform non-standard evaluation because you want to delay the evaluation of your `y` value. Check out the [dplyr NSE vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html) or the [Advanced R - Non Standard Evaluation](http://adv-r.had.co.nz/Computing-on-the-language.html) section. It's usually more trouble than it's worth. – MrFlick Dec 19 '16 at 21:51

1 Answers1

0

You need to use the standard evaluation variant extract_() and turn your second argument into a string:

  # get all links in new column "url"
  df <- tidyr::extract_(x, y, "url", "(http.*)")

extractLinks(df, "string")
scoa
  • 19,359
  • 5
  • 65
  • 80