0

I'm trying to read a txt file (comma delimiter) into a CSV using csv.reader(). But because my delimiter (,) is sometimes inside the object/item the whole 'row' of read items get's shifted.

Example:

input.txt:

Stevenson Corp, 123 Main St, 3 employees\n
Johnson Inc, 456 Main St, 5 employees\n

would result in CSV columnized as:

Stevenson Corp | 123 Main St | 3 employees
Jonson Inc | 456 Main St | 5 employees

However, the issue arises if I have my input.txt file has (,) inside the items being delimitered, example:

input_bad.txt:

Stevenson Corp, 123 Main St, 3 employees\n
Johnson, Inc, 456 Main St, 5 employees\n #notice the comma before Inc

would result in in an incorrect CSV columnized as:

Stevenson Corp | 123 Main St | 3 employees #3 columns 
Jonson | Inc | 456 Main St | 5 employees #4 columns (issue)

I can't think of any solution to keep the Jonson, Inc together not split by the "," delimiter.

My code opens the txt file and csv as such:

inputfile = open(os.path.join(somelocation, somefile.txt), "r", encoding="utf-8", errors="replace")

csv_data = csv.reader(inputfile, delimiter = ",")

Please help.

davidhoff22
  • 13
  • 1
  • 4
  • 3
    You have an invalid CSV file. The string should have been enclosed in double quotes (`"`) if it contains the delimiter character. See [this question](https://stackoverflow.com/questions/3475856/write-text-with-comma-into-a-cell-in-csv-file-using-python). – Selcuk Oct 10 '18 at 05:24
  • The whole purpose of a separator is that your csv is correctly formatted for it to be read. If it isn't then i'd suggest you do some pre-processing before reading it as a csv. – jar Oct 10 '18 at 05:29

1 Answers1

1

Best approach would be to go back and change the delimiter in your file from , to something more sensible but if that's not an option so you can do something like this as a workaround:

import csv
with open(os.path.join(somelocation, somefile.txt), "r", encoding="utf-8", errors="replace") as inputfile:
    spamreader = csv.reader(inputfile, delimiter='¬')
    for row in spamreader:
        new_row = row.rsplit(",", 2)
        print("|".join(new_row))

This uses a delimer not seen in your text so doesn't split any lines (so you could do the reading with inputfile.readlines() instead), then it uses rsplit to split on the two rightmost commas to create the columns

Sven Harris
  • 2,884
  • 1
  • 10
  • 20