0

I want to write the rows of a csv file to another csv file. I want to change the content of each row as well in a way that if the row is empty, it remains empty and if it is not, any spaces at the beginning and end of the string are omitted. The original csv file has one column and 65422771 rows.

I have written the following to write the rows of the original csv file to the new one:

import csv

csvfile = open('data.csv', 'r')

with open('data 2.csv', "w+") as csv_file1: 
    writer = csv.writer(csv_file1)
    count = 0
    for row in csvfile:
        row = row.replace('"', '')
        count+= 1
        print(count)
        if row.strip() == '':
            writer.writerow('\n')
        else:
            writer.writerow(row)

However, when the new csv file is made, it is shown that it has 130845543 rows (= count)! The size of the new csv file is also 2 times the size of the original one. How can I create the new csv file with exactly the same number of rows but with the mentioned changes made to them?

  • An obvious bug is that you mean `writer.writerow('')`, not `writer.writerow('\n')`. Removing all double quotes looks like a rather strange thing to do, too. You should use a proper `csv.reader` just like you use `csv.writer`. – tripleee Feb 18 '21 at 19:56

1 Answers1

0

Try this:

import csv

with open('data.csv', 'r') as file:
    rows = [[row[0].strip()] for row in csv.reader(file)]

with open('data_out.csv', "w", newline = "") as file:
    writer = csv.writer(file)
    writer.writerows(rows)

Also, as @tripleee mentioned, your file is quite large so you may want to read / write it in chunks. You can use pandas for that.

import pandas as pd

chunksize = 10_000

for chunk in pd.read_csv('data.csv', chunksize = chunksize, header = None):
    chunk[0] = chunk[0].str.strip()
    chunk.to_csv("data_out.csv", mode="a", header = False, index = False)
sarartur
  • 1,178
  • 1
  • 4
  • 13
  • Why are you extracting only the first field from each line? – tripleee Feb 18 '21 at 19:58
  • Because the question states: "The original csv file has one column and 65422771 rows.". So I am extracting the first and only cell in each row.. – sarartur Feb 18 '21 at 19:59
  • 1
    Oh nvm, sorry, sloppy reading. You might want to avoid reading the entire file into memory, though. – tripleee Feb 18 '21 at 20:00
  • _csv.Error: field larger than field limit (131072) –  Feb 18 '21 at 20:20
  • How big are the fields in your column? You can try `import sys` and then add: `csv.field_size_limit(sys.maxsize)` to the top of your script. . – sarartur Feb 18 '21 at 20:24
  • @triplee good point, I edited the answer, – sarartur Feb 18 '21 at 20:43
  • The trivial way to do that is `with open('data.csv', 'r') as infile, open('data_out.csv', "w", newline = "") as outfile: reader = csv.reader(infile); writer = csv.writer(file); for row in reader: writer.writerow([row[0].strip()])` – tripleee Feb 19 '21 at 05:57
  • The field limit fix in more detail: https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072 – tripleee Feb 19 '21 at 05:59