0
data_before <- read_excel("C:/Users/babyb/Desktop/Derrick Rancourt/Canadian Biotech Companies.xlsx", col_names = FALSE)
companyName <- subset(na.omit(data_before, cols = 1), select = -c(2, 3, 4))
data_now <- setNames(data.table(matrix(nrow=0, ncol=2)), c("Company Name", "Website"))

for(value in companyName){
        searchTerm <- paste(value)
        print(searchTerm)
        firstLink <- get_link(searchTerm)
        print(firstLink)
        #this_row <- data.frame(value, firstLink)
        #names(this_row)<-c("Company Name", "Website")
        #data_now <- rbind(data_now, this_row)
}

get_link is a function previously defined. if companyName was

    1  
1   a 
2   3
3   b
4   2

then the print search term prints

[1] a
[2] 3
[3] b
[4] 2

as expected. But print first link only prints

[1] get_link("a")

, when I want it to print

[1] get_link("a")
[2] get_link("3")
[3] get_link("b")
[4] get_link("2")

I'm using the code for get_first_google link from the following answer https://stackoverflow.com/a/57441619/14084227 The code is:

get_first_google_link <- function(name, root = TRUE) {
  url = URLencode(paste0("https://www.google.com/search?q=",name))
  page <- xml2::read_html(url)
  # extract all links
  nodes <- rvest::html_nodes(page, "a")
  links <- rvest::html_attr(nodes,"href")
  # extract first link of the search results
  link <- links[startsWith(links, "/url?q=")][1]
  # clean it
  link <- sub("^/url\\?q\\=(.*?)\\&sa.*$","\\1", link)
  # get root if relevant
  if(root) link <- sub("^(https?://.*?/).*$", "\\1", link)
  link
}

Why is it not acting as expected? Expectations are outlined above. I'm using r. I've shown the console from rstudio. Can someone please help?

Phil
  • 7,287
  • 3
  • 36
  • 66
JoEy
  • 1
  • 1

1 Answers1

0

The loop isn't working because companyName is of class "data.frame" in your loop, so only the first row is getting used:

for(value in companyName)

You simply need to convert it to a vector using unlist like so:

companyName <- unlist(subset(na.omit(data_before, cols = 1), select = -c(2, 3, 4)))

And the loop should work.

Dealec
  • 287
  • 1
  • 5