Using Python to replace "\r\r\n" with "\r\n" in a binary file

Question

I'm very new to Python and just crawling my way through it to accomplish a task and would appreciate some help (Python 3.1).

I have a CSV file written with DictWriter with a dialect of "excel". After the file is created, I'm notice extra lines in the file, and upon closer inspection it's because I have "\r\r\n" at the end of each line instead of "\r\n".

I could solve this one of 2 ways:

Open the file in binary mode instead of text. Problem with this is that I cannot for the life of me figure out how to get writerow() to work against a binary file -- I get a ton of exceptions.
Second (easier) solution is just replacing all the "\r\r\n" with "\r\n".

However, on my attempts, I ran into these errors:

a. Not closing the file first, and the search and replace just adds even more "\r\r\n" lines. b. I've tried closing the file first, to re-open in binary mode and doing the same search and replace but I"m getting and error:

WindowsError: [Error 32] The process cannot access the file because it is being used by another process

Here is the code:

#code before this writes to the final in text mode
myfile.close()
myfile = open(outputFile, "wb")
for line in fileinput.FileInput(outputFile, inplace=1):
    line = line.replace("\r\r\n", "\r\n")
    print (line)
myfile.close()

Would appreicate any help anyone can provide!

Side-note: The reason this happened is because you didn't call `open` for the `.csv` file with `newline=''` (the _only_ correct way to open a file that you're going to use with the `csv` module) and you were on Windows, using a Windows-like dialect of `csv`. `csv.writer` wrote `\r\n` explicitly (because the dialect in question used `\r\n` as the newline character) and by failing to disable line ending conversions in `open` with `newline=''`, the `\n` was converted to `\r\n` by `io.TextIOWrapper`, making it `\r\r\n` on disk. — ShadowRanger, Feb 25 '17 at 14:46

Alex Martelli · Accepted Answer · 2010-03-05T03:28:04.650

5

The safe way to alter a file (with the exception of appending, which can be safely done in-place) is to copy it with modification to a new file, remove the old one, rename the new like the old. This is the one solid way to avoid catastrophic errors and data loss. Depending on the platform, the step to "remove old, rename new" can be atomic, but that's hard in Windows and not all that crucial.

So I'd simply do that -- in one big gulp, unless the file is horribly huge (gigabyte-plus):

with open(filename, 'rb') as f:
  data = f.read()
with open(newfilename, 'wb') as f:
  f.write(data.replace('\r\r\n', '\r\n'))
os.unlink(filename)
os.rename(newfilename, filename)

The problems with your code are of confusion between binary and text mode -- you can't properly "read a line" from a binary-mode opened file, for example.

Edit in Python 3.1 we need to deal with bytes instances here, not strings, since the files are binary ones. So, per the docs, the write calls must become

  f.write(data.replace(b'\r\r\n', b'\r\n'))

those b prefixes tell Python we're dealing with bytes, not strings.

edited Mar 05 '10 at 03:28

answered Mar 05 '10 at 03:09

Alex Martelli

854,459
170
1,222
1,395

I just tried this but getting this error: "TypeError: expected an object with the buffer interface" on this line: "f.write(data.replace('\r\r\n', '\r\n'))" – TMC Mar 05 '10 at 03:20
@TMC, you should have mentioned you are using Python3 ;) – John La Rooy Mar 05 '10 at 03:24
Ah, Python 3.1 -- I noticed it just now in your question's body (there's a specific tag for it, since so often the proper answers differ drastically between the 2.5/2.6 that almost everybody is using, and the newer 3.1). The solution is at: http://docs.python.org/3.1/library/stdtypes.html#bytes-and-byte-array-methods -- let me edit the answer to clarify. – Alex Martelli Mar 05 '10 at 03:26
@gnibbler, he did (in a parentheses hiding at the end of the first paragraph), just not prominently enough for me to notice (i.e ideally as a tag;-). I've now edited the answer to show the tiny change needed for Python 3 purposes. – Alex Martelli Mar 05 '10 at 03:29

score 1 · Answer 2 · edited May 23 '17 at 12:16

1

Also, the problem you are having with \r\r\n could be caused by you being on the Windows platform and by opening the file in text mode, rather than in binary mode.

I was having this problem, and found the answer here Python 2 CSV writer produces wrong line terminator on Windows

edited May 23 '17 at 12:16

Community

1
1

answered Aug 28 '10 at 18:12

Kerridge0

2,017
19
22

score 0 · Answer 3 · answered Mar 05 '10 at 03:09

0

Try this:

fileR = open(outputFile, "r")
text = fileR.read().replace("\r\r\n", "\r\n")
fileR.close()
fileW = open(outputFile, "wb")
fileW.write(text)
fileW.close()

answered Mar 05 '10 at 03:09

We Are All Monica

13,000
8
46
72

This didn't work either. When opening up the file the second time as a binary file ("b" flag), I get this error when I try to write out the new text: "TypeError: must be bytes or buffer, not str". I tried it without the binary flag (so opening it as text) but I'm just getting the same problem. Each line is written out with \r\r\n instead of \r\n. – TMC Mar 05 '10 at 03:24

score 0 · Answer 4 · answered Mar 05 '10 at 03:11

0

I'm not too well versed with special cases in file handling. However, since you mentioned that you are dealing with a CSV file (which can be opened with excel), I would recommend taking a peek into pyExcelerator.

Hope this helps

answered Mar 05 '10 at 03:11

inspectorG4dget

110,290
27
149
241

Useful library to have handy, but overkill for me right now. I already have the CSV files written out correctly and can easily import to Excel. – TMC Mar 05 '10 at 03:27

score 0 · Answer 5 · edited May 23 '17 at 12:31

0

To correctly write the CSV files instead of correcting them after the fact, see this question: Python3: writing csv files

edited May 23 '17 at 12:31

Community

1
1

answered Feb 13 '12 at 18:02

Nick Garvey

2,980
24
31

Using Python to replace "\r\r\n" with "\r\n" in a binary file

5 Answers5