0

I'm having trouble editting a csv file using Python. I am receiving a csv file from a supplier and have no flexibility on how the data is being provided. The issue I'm running into, is that one of the columns - the last column in the file - contains very long strings (sometimes upwards of 32000 characters) and when opening the file in excel, the rows with these very long strings in a column will automatically overflow onto the next line. To work around this, I've tried taking the approach of: if there is a very long string in the last column, break that string at a certain point and put a comma (delimiter) there so the very long string is now broken across two columns and will not overflow into a new row. I have achieved this and when opening in a text editor, can see the comma delimiter in the right place. But when opening in excel, I still run into the text overflowing and not getting populated in a second column.

To me, this is suggesting an issue with excel and how it is interpreting the csv file and maybe it is not recognising the comma I have inserted as a delimiter?

The code below is how I've tackled the problem so far:

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = input("File Name in current folder: ")

tempfile = NamedTemporaryFile("w+t", newline='', delete=False)

with open(filename, "rt") as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=",", quotechar='"')
    writer = csv.writer(tempfile, delimiter=",", quotechar='"')
    
    for row in reader:
        if len(row[21]) > 30000:
            row[21] = row[21][:len(row[21])-20004] + "," + row[21][len(row[21])-20003:]
            
        writer.writerow(row)

shutil.move(tempfile.name, filename)

Thanks and any suggestions would be greatly appreciated!

christohew
  • 49
  • 3

1 Answers1

2

Since you are using the csv module, and you aren't adding a record to the array, but rather adding a ',' to the existing record, it will be escaped (i.e.: included in quotes) and written as part of the same "column". What you probably want is to:

  • use python slices features
  • write the headers (with the new column)
  • add a column where necessary:
from tempfile import NamedTemporaryFile
import shutil
import csv

filename = input("File Name in current folder: ")

tempfile = NamedTemporaryFile("w+t", newline='', delete=False)

with open(filename, "rt") as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=",", quotechar='"')
    writer = csv.writer(tempfile, delimiter=",", quotechar='"')
    
    for row in reader:
        if len(row[21]) > 30000:
            row.append(row[21][-20003:])
            row[21] = row[21][:-20003]
            
        writer.writerow(reader.fieldnames + ['overflow'])
        writer.writerow(row)

shutil.move(tempfile.name, filename)
Nicola
  • 81
  • 3