I have a larger csv file (about 550 mb) and a smaller csv file (about 5mb) and I want to combine all the rows into one csv file. They both have the same header (same order, values, number of columns) and obviously the larger file has more rows. I'm using 32-bit Python (can't change it) and I'm having issues appending the csv's. It seems that the top answer and the next answer after the top answer works here: How do I combine large csv files in python?. However, this takes an ungodly amount of time and I am looking for ways to expedite the process. Also, when I stop running the code in the second answer for the linked question (since it takes so long to run), the first row in the resulting csv is always empty. I guess when you call pd.to_csv(..., mode='a', ...), it appends below the first row of the csv. How do you ensure the first row is populated?
Asked
Active
Viewed 66 times
0
-
are the files the same? i mean the same number, type and same name.if they are u can use python's [write](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) function instead and append the file, line by line, instead of using pandas – sammywemmy Apr 07 '20 at 01:07
-
The top answer for the linked question uses this and I tried it, but it still takes probably an hour or two to run, maybe even longer, I usually stop running a file if it takes too long. – Chunky Monkey Apr 07 '20 at 01:51
1 Answers
0
This is much simpler in Linux command line, and won't need to load the file into memory
Use the tail command, the +2 is the number of lines to skip. Often for me, because of how the files are formatted I need +2 instead of +1:
tail -n +2 small.csv >> giant.csv
This should do the trick.
If you need to do it in python then, something like append mode might work but will need to load into memory.

tzujan
- 186
- 1
- 10
-
-
Sorry I don't know Windows well enough and don't have a PC in front of me. With that said, there are common utilities GNU / CoreUtilities that many Windows users add to their system so that they can run many of the traditional Unix/Linux commands. Might be worth checking out. – tzujan Apr 07 '20 at 03:12