0

I have a csv file whose second column has links in them, and in the 3 column for each link I would like to insert their domain names. I know that there is an AWK command to do this:

echo http://news.blogs.cnn.com/2013/04/15/explosions-near-finish-of-boston-marathon/?hpt=hp_t1 | awk -F/ '{print $3}'

And I would like to get the results on the third column for each URL. I tried doing this in R, but that didn't work, is there any other way I can do this? E.g. through the terminal?

EDIT: Or, how can I insert a variable into the system() call? So, something like, variable a has my URL and I want to call:

system("echo 'a' | awk -F/ '{print $3}'")
dira
  • 137
  • 1
  • 4
  • For R, see [this question](http://stackoverflow.com/questions/17285439/does-r-have-any-package-for-parsing-out-the-parts-of-a-url/17286485#17286485). – Thomas Jul 02 '13 at 10:42

1 Answers1

0

I think the OP want to know how to use awk or cut from R by inserting variable in the system call.

One way to do that is to use sprintf to build the command which will feed system.

a <- "echo http://news.blogs.cnn.com/2013/04/15/explosions-near-finish-of-boston-marathon/?hpt=hp_t1"

### with Awk
cmd <- sprintf("echo %s | awk -F/ '{print $3}'", a)
system(cmd, intern = TRUE)
## [1] "news.blogs.cnn.com"


### Using cut
cmd2 <- sprintf("echo %s | cut -d/ -f3", a)
system(cmd2, intern = TRUE)
## [1] "news.blogs.cnn.com"

By default system is not vectorized so you can't apply directly the same approach if you have a column with more than one url.

So you need to "vectorize" the system function first

system_vect <- Vectorize(system, vectorize.args = "command", USE.NAMES = FALSE)

b <- "http://www.r-bloggers.com/some-common-approaches-for-analyzing-likert-scales-and-other-categorical-data/"

cmd3 <- sprintf("echo %s | awk -F/ '{print $3}'", c(a, b))
system_vect(cmd3, intern = TRUE)
## [1] "news.blogs.cnn.com" "www.r-bloggers.com"


system(cmd3, intern = TRUE)
## [1] "news.blogs.cnn.com"
dickoa
  • 18,217
  • 3
  • 36
  • 50