0

I have a large csv file for which i need to split the file. I have managed to split the file using the below python code:

 import csv

 divisor = 500000

 outfileno = 1 outfile = None

 with open('file_temp.txt', 'r') as infile:
     for index, row in enumerate(csv.reader(infile)):
         if index % divisor == 0:
             if outfile is not None:
                 outfile.close()
             outfilename = 'big-{}.csv'.format(outfileno)
             outfile = open(outfilename, 'w')
             outfileno += 1
             writer = csv.writer(outfile)
         writer.writerow(row)

The problem i'm facing is that the file header is not getting copied to the rest of the files. Can you please let me know how can i modify my code to add the headers in the different splitted files.

Siddhartha
  • 53
  • 6
  • Possible duplicate of [Pythonically add header to a csv file](https://stackoverflow.com/questions/20347766/pythonically-add-header-to-a-csv-file) – ivan_pozdeev Jun 11 '18 at 09:12
  • Possible duplicate of [splitting one csv into multiple files in python](https://stackoverflow.com/questions/36445193/splitting-one-csv-into-multiple-files-in-python) – Mr. T Jun 11 '18 at 09:13
  • those did not solve my issue as i could not understand how to run them as i am new to python. – Siddhartha Jun 11 '18 at 12:12

1 Answers1

0

You just need to cache the header row and then write it out for each CSV file, something like:

import csv

divisor = 500000
outfileno = 1
outfile = None

try:
    with open('file_temp.txt', 'r') as infile:
        infile_iter = csv.reader(infile)
        header = next(infile_iter)
        for index, row in enumerate(infile_iter):
            if index % divisor == 0:
                if outfile is not None:
                    outfile.close()
                outfilename = 'big-{}.csv'.format(outfileno)
                outfile = open(outfilename, 'w')
                outfileno += 1
                writer = csv.writer(outfile)
                writer.writerow(header)
            writer.writerow(row)
finally:
    # Don't forget to close the last file
    if outfile is not None:
        outfile.close()

Since you're only working with lines, you don't really need to use the CSV module, here's a version that works without it:

divisor = 500000
outfileno = 1
outfile = None

try:
    with open('file_temp.txt', 'r') as infile:
        header = next(infile)
        for index, row in enumerate(infile):
            if index % divisor == 0:
                if outfile is not None:
                    outfile.close()
                outfilename = 'big-{}.csv'.format(outfileno)
                outfile = open(outfilename, 'w')
                outfileno += 1
                outfile.write(header)
            outfile.write(row)
finally:
    # Don't forget to close the last file
    if outfile is not None:
        outfile.close()
Nathan Villaescusa
  • 17,331
  • 4
  • 53
  • 56
  • hi, thank you for the quick response. I noticed that when i am running the code above, the first file is getting a double quote at the start of each line and at the end of each line. This is not happening for the rest of the splitted files. Please help. Example "abc,asd,dsf,sdg,sgd" – Siddhartha Jun 11 '18 at 11:07
  • Hmm, that isn't happening for me at all. Since you only care about preserving copying individual lines you could simplify things and remove all the CSV logic. I will post an updated example. – Nathan Villaescusa Jun 11 '18 at 19:15