5

I want to read a csv file with each line dictated by a newline character ('\n') using Python 3. This is my code:

import csv
with open(input_data.csv, newline ='\n') as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]

This above code gave error:

batch_data = [line for line in csvread].
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Reading these posts: CSV new-line character seen in unquoted field error, also tried these alternatives that I could think about:

with open(input_data.csv, 'rU', newline ='\n') as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]


with open(input_data.csv, 'rU', newline ="\n") as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]

No luck of geting this correct yet. Any suggestions?

I am also reading the documentation about newline: if newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

So my understanding of this newline method is:

1) it is a necessity,

2) does it indicate the input file would be split into lines by empty space character?

Community
  • 1
  • 1
enaJ
  • 1,565
  • 5
  • 16
  • 29

1 Answers1

12
  1. newline='' is correct in all csv cases, and failing to specify it is an error in many cases. The docs recommend it for the very reason you're encountering.

  2. newline='' doesn't mean "empty space" is used for splitting; it's specifically documented on the open function:

If [newline] is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.

So with newline='' all original \r and \n characters are returned unchanged. Normally, in universal newlines mode, any newline like sequence (\r, \n, or \r\n) is converted to \n in the input. But you don't want this for CSV input, because CSV dialects are often quite picky about what constitutes a newline (Excel dialect requires \r\n only).

Your code should be:

import csv
with open('input_data.csv', newline='') as f:
    csvread = csv.reader(f)
    batch_data = list(csvread)

If that doesn't work, you need to look at your CSV dialect and make sure you're initializing csv.reader correctly.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • thanks so much for pointing me to the right documentation of open function. Just to confirm I understand you correctly, if the input file is using '\n', the code you recommended would read and split each row properly, right? – enaJ Nov 08 '16 at 02:33
  • I was repetitively asking for confirmation, b/c the input file is too big to open as a csv (I can't eyeball see it). The only info that know about it is "\n" separate each row. I don't know to to verify my code was doing the right row separation by comparing the real csv file and the code read in file. – enaJ Nov 08 '16 at 02:37
  • 1
    @enaJ: Yes. It doesn't matter what line ending convention the input file uses when you use `newline=''`, it will treat _any_ possible line ending as being the end of the line and return the data from that line (including the unconverted characters representing the end of the line). The `csv` module will recognize endings that don't match the CSV dialect and combine lines as needed to match the dialect chosen (and combine lines when the newline occurs inside a quoted field, so an embedded newline in a field doesn't turn it into multiple records on read). – ShadowRanger Nov 08 '16 at 21:00
  • thanks again for your great help and patience!! Let me ask one more question on this front: how if 'newline ='' " ' is used for all input cases, how does it differentiate a input file use '/n' as new line deliminator and another file use ', "? – enaJ Nov 08 '16 at 22:06
  • 1
    @enaJ: What format are you using where records (as opposed to fields) are separated by commas? That question doesn't even make sense. For the record, `csv` is documented to ignore the value of `lineterminator` for readers and just treat either `\r` or `\n` as a line terminator; you can't use non-newline-y characters to separate records on read. – ShadowRanger Nov 08 '16 at 23:18