I am currently using the below function to read in and combine several(7) csv's in R.
csv_append <- function(file_path = filePath){
files <- grep(list.files(path = file_path,full.names = TRUE), pattern="final_data_dummied_", value=T)
###Load all files into a list of dataframes
df_list = lapply(files,fread,nThread = 4)
DT = rbindlist(df_list,fill = TRUE)
# Convert data.table to dataframe
df_seg = setDF(DT)
rm(list = c('DT','df_list'))
# Replace missing values with 0
df_seg[is.na(df_seg)] <- 0
return (df_seg)
}
However the original files are large(.5 million rows and ~3500 columns). The number of columns vary from 3400 to 3700 and when I combine these files R gives memory error : cannot allocate vector of size 85Gb
I am thinking if I take intersection of columns from all the csvs and read in only those columns from each csv it might solve the problem.
But I am not sure how can I do that while reading in the files.
Can someone please help me with this?