0

I am trying to test csv files generated using csv.writer on python 2 & 3 on Linux (Ubuntu 16.04 x64) and Windows(x64). Unfortunately, Windows adds an extra \r everytime it writes to a file.

I thought I would debug this by trying to print repr of the lines in these files, but the output of repr doesn't show the presence of a \r on Windows.

For example, a line from a csv file is showed as this in the terminal in Windows:

'display,resource,refs\n'

The same sentence from the same file is shown as this on Ubuntu:

'display,resource,refs\r\n'

How can I possibly debug these extra \r that get added to my files?

goelakash
  • 2,502
  • 4
  • 40
  • 56
  • How are you printing those lines? What are you actually doing where this matters? Windows and Unix use different line-ending conventions, so text files will be slightly different on the two systems. But you can convert between them with various programs, and many programs can recognize either format without requiring you to do anything. – BrenBarn Jun 21 '16 at 20:38
  • How are you opening the file? You probably want `wb` and not `w`. – MatsLindh Jun 21 '16 at 20:38
  • @BrenBarn Running a test suite with pre-stored hashes on these files gives errors on Windows. I am trying to get past those be removing the extra carriage returns. – goelakash Jun 21 '16 at 20:40
  • @goelakash did you take a look at this http://stackoverflow.com/questions/3191528/csv-in-python-adding-an-extra-carriage-return – msvalkon Jun 21 '16 at 21:29
  • @msvalkon Opening in binary is not an option, but I did find a utility that works exactly the way how I want my files to be. – goelakash Jun 21 '16 at 21:40

2 Answers2

0

I found a utility called dos2unix installable through pip.

To convert all \r\n instances to \n instances of my csv file, I can use this code:

import os
os.system("dos2unix -n "+filename+" "+filename)
goelakash
  • 2,502
  • 4
  • 40
  • 56
0

The problem is that windows and linux define the end of line marker differently. For linux it's just line feed '\n', but for windows it is line feed followed by carriage return '\r\n'.

When you open a file for writing in text mode in Python on Windows any line feed characters are automatically converted to '\r\n'. This is desirable as many other programs on windows do not consider a line feed character on its own as a new line marker (Notepad for example).

To get around this, you can explicitly tell python what you want the new line marker to look like. eg.

with open("text.txt", "w", newline="\n") as f:
    f.write("hello\n")
    f.write("world\n")

# open in binary mode so we can see exactly what is in the file
with open("text.txt", "rb") as f:
    data = f.read()

print(repr(data))
assert data == b"hello\nworld\n" 

If you are using Python 2, then you need to use the open function in the io module instead.

Dunes
  • 37,291
  • 7
  • 81
  • 97