big picture: I have a large number of CSV files (of the same format) I am processing with R. my practice is to use file.append to concatenate them into a small number of larger files and process from there. now I have a new issue: the csv files are arriving with headers so if I just append them the headers (of the appended files except for the first) will mix with the data. I'm looking for an efficient solution; how should I tackle this optimizing processing time and memory?
in particular:
[1] what is the most efficient way to remove the header / first line in the file, assuming the header is usually very small compared to the overall size of the file?
[2] are the resources required proportional to the file size or there is just a fixed overhead?
[3] is there a different approach in which I can append the files without their header, without actually reading the whole files to memory?
update and clarification: if I could load all files to memory the problem would have been easy. I cannot load all data at once in any case. I am per-processing the files in order to understand the portions I require. since there are so many files, the number of files itself becomes a bottleneck: loading each of the files, gathering the info I need from each file, and then combining the info becomes the bottleneck. therefore,as a middle ground, concatenating/appending the files into a larger chunks worked well until now - file.append works efficiently without actually reading the files. since now the csv files contains headers I would like to find a way to append the files without reading all of their content. I am able read and append them in chunks, but once again it would slow my process, adding another expensive full read of the content.