data_before <- read_excel("C:/Users/babyb/Desktop/Derrick Rancourt/Canadian Biotech Companies.xlsx", col_names = FALSE)
companyName <- subset(na.omit(data_before, cols = 1), select = -c(2, 3, 4))
data_now <- setNames(data.table(matrix(nrow=0, ncol=2)), c("Company Name", "Website"))
for(value in companyName){
searchTerm <- paste(value)
print(searchTerm)
firstLink <- get_link(searchTerm)
print(firstLink)
#this_row <- data.frame(value, firstLink)
#names(this_row)<-c("Company Name", "Website")
#data_now <- rbind(data_now, this_row)
}
get_link is a function previously defined. if companyName was
1
1 a
2 3
3 b
4 2
then the print search term prints
[1] a
[2] 3
[3] b
[4] 2
as expected. But print first link only prints
[1] get_link("a")
, when I want it to print
[1] get_link("a")
[2] get_link("3")
[3] get_link("b")
[4] get_link("2")
I'm using the code for get_first_google link from the following answer https://stackoverflow.com/a/57441619/14084227 The code is:
get_first_google_link <- function(name, root = TRUE) {
url = URLencode(paste0("https://www.google.com/search?q=",name))
page <- xml2::read_html(url)
# extract all links
nodes <- rvest::html_nodes(page, "a")
links <- rvest::html_attr(nodes,"href")
# extract first link of the search results
link <- links[startsWith(links, "/url?q=")][1]
# clean it
link <- sub("^/url\\?q\\=(.*?)\\&sa.*$","\\1", link)
# get root if relevant
if(root) link <- sub("^(https?://.*?/).*$", "\\1", link)
link
}
Why is it not acting as expected? Expectations are outlined above. I'm using r. I've shown the console from rstudio. Can someone please help?