0

I have a long list of 5,000+ .htm files, each representing a firm, and I would like to merge them to get a dataset that binds them all.

I have been trying to load the data in R as follows (the extension is *.xls due to a wrong name in the origin, but the true file extensions are *.htm)

library(readxl)
file.list <- list.files(pattern='*.xls')
library(rvest)
df.list <- lapply(file.list, read_html)

But I only get a list of objects and I don't know how to analyze them in R as I would have an observation per row.

If I run the following code

data<-read_html("data.xls")
data<-html_table(data, fill=TRUE)[[1]]
data<-data[-1,] #to remove the first row with column names 
data<-as.data.frame(data) here

I get what I want, but only for one file, so then I would need to repeat this for all the files manually.

Any help?

Thanks m

  • 2
    Please add a sample of your dataset using `dput()` to make your question reproducible. Without seeing your dataset, your could try `do.call(rbind.data.frame, your_list)` where "your_list" is the name of your list. Check [here](https://stackoverflow.com/questions/4227223/convert-a-list-to-a-data-frame) for more details. – L Tyrone Mar 26 '23 at 21:52

0 Answers0