I have constructed a data frame by row binding different web-scraped tables.
# html files
filelist <- c("Prod223_2688_00185641_20190930.html","Prod224_0078_SO305092_20191130.html",
"Prod224_0078_SO305426_20190831.html", "Prod224_0078_SO305431_20190831.html",
"Prod224_0078_SO305440_20190831.html", "Prod224_0078_SO305451_20200331.html",
"Prod224_0078_SO306088_20190531.html", "Prod224_0078_SO306098_20180630.html",
"Prod224_0078_SO306098_20190630.html", "Prod224_0078_SO306411_20190530.html")
# web scraping tables
mydata <- lapply(filelist, function(x) {
read_html(x) %>% rvest::html_table(fill = T) %>%
dplyr::nth(2)
})
# row binding (adding a new column with row .id)
mydata <- rbindlist(mydata, idcol=T, fill = T)
I want to create a new column company
with the respective name from filelist
based on row .id. The company name is the third code in between _
. To get something like this:
mydata
company id. X2 ..
00185641 1 ..
00185641 1 ..
SO305092 2 ..
SO305426 3 ..
SO305426 3 ..
This may be quite a simple question but I am not confident with functions in R yet. I have seen this similar questions and tried:
mydata2 <- mydata2 %>% mutate(company=lapply(mydata2,filelist))
# and this:
mydata2$company <- rep(paste(filelist), length(mydata2$.id))