I'm trying to code a loop for web scraping.
The loop does : for each name in a list, finds some metrics on a webpage dedicated to this name, and builds a dataframe with all the names and related metrics.
Here is the code :
map_df(1:40, function(i) {
link = read_html(paste(link2,names[i], sep = ""))
htmlnodes = html_nodes(link, ".col_2")
htmltext = html_text(htmlnodes)
datatable = data.table(htmltext)
data.table(name = names[i],
Var1 = datatable$htmltext[as.numeric(which(grepl("Var1", datatable$htmltext))+1)],
Var2 = datatable$htmltext[as.numeric(which(grepl("Var2", datatable$htmltext)) +1)],
Var3 = datatable$htmltext[as.numeric(which(grepl("Var3", datatable$htmltext)) +1)],
Var4 = datatable$htmltext[as.numeric(which(grepl("Var4", datatable$htmltext)) +1)],
Var5 = datatable$htmltext[as.numeric(which(grepl("Var5", datatable$htmltext)) +1)],
Var6 = datatable$htmltext[as.numeric(which(grepl("Var6", datatable$htmltext)) +1)],
Var7 = datatable$htmltext[as.numeric(which(grepl("Whitelist/Var7", datatable$htmltext)) +1)],
stringsAsFactors = FALSE)
}) -> Mydata
(The reason I use the which/Grepl functions is because all the retrieved data is in a single column and the value of each metric is 1 row below the name of the metric).
I checked with fewer metrics, and the loop works.
But I get the following error message :
Error in data.table(name = names[i], Var1 = datatable$htmltext[as.numeric(which(grepl("Var1", :
Item 8 has no length. Provide at least one item (such as NA, NA_integer_ etc) to be repeated to match the 1 rows in the longest column. Or, all columns can be 0 length, for insert()ing rows into.
I guess it means I have to implement an ifelse function for when the loop does not find metrics such as "hardcap" or "country" on a particular webpage for the ith item, but i don't know how to.
Thanks for your help :)