2

Using Python 2.7 (Anaconda Distribution): I need to parse a text file randomly terminating lines on either "\n" or "\r\n". When I use the open function with the option "rb" the script successfully interprets "\n" as a line terminator, but not "\r\n". When I use open with option "rU" (supposedly offers universal newlines support), the script breaks lines at "\r\n" but not at "\n". How can I open the file recognizing either of these different line terminators?

# recognizes "\n" but not "\r\n"
with open(infile, 'rb') as f:
    reader = csv.reader(f, delimiter='|')

# recognizes "\r\n" but not "\n"
with open(infile, 'rU') as f:
    reader = csv.reader(f, delimiter='|')
eric s
  • 197
  • 1
  • 6
  • What about using `open(infile, 'rbU')`? – Hook Apr 09 '21 at 15:22
  • @Hook 'rbU' gives the same result as 'rU'. – eric s Apr 09 '21 at 15:31
  • Friendly side-note that Python 2.7 reached end of life 1 Jan 2020; you could consider moving to 3. – msanford Apr 09 '21 at 16:15
  • Does this answer your question? [How to convert CRLF to LF on a Windows machine in Python](https://stackoverflow.com/questions/36422107/how-to-convert-crlf-to-lf-on-a-windows-machine-in-python) (There's a simple way to do it in-line.) – msanford Apr 09 '21 at 16:15

1 Answers1

3

Is it an option to simply sanitize your input files before feeding them to the python script? dos2unix will do convert your mixed-case \r\n and \n endings to just \n ones for you.

sinback
  • 926
  • 5
  • 17
  • Thanks, I tried this, and now it doesn't matter whether I use 'rb' or 'rU', I can't parse `\n` line terminators. After additional diagnostics, it looks like the issue is caused by a field containing a single, unclosed double-quotation mark ("). Good tool to know about though. – eric s Apr 09 '21 at 16:20
  • Oof, yeah, stuff like that is a pain. Does stuff work ok now if you take the " out? – sinback Apr 09 '21 at 16:22
  • After a crash course in sed to remove those characters, it worked. Thanks. – eric s Apr 09 '21 at 16:55