10

The things I've googled haven't worked, so I'm turning to experts!

I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use "show all characters", I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can't seem to figure it out. Here's a snippet of the text file showing a line with the carriage return:

firstcolumn secondcolumn    third   fourth  fifth   sixth       seventh
moreoftheseventh        8th             9th 10th    11th    12th                    13th

Here's the code I'm trying to use to replace it, but it's not finding the return:

with open(infile, "r") as f:
    for line in f:
        if "\n" in line:
            line = line.replace("\n", " ")

My script just doesn't find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.

Further information: The goal here is select two columns from the text file, so I split on \t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.

vals = line.split("\t")
print(vals[0] + " " + vals[9])

So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don't have the [CR][LF], it works as expected.

mrcoulson
  • 1,331
  • 6
  • 20
  • 35

5 Answers5

7

Depending on the type of file (and the OS it comes from, etc), your carriage return might be '\r', '\n', or '\r'\n'. The best way to get rid of them regardless of which one they are is to use line.rstrip().

with open(infile, "r") as f:
    for line in f:
        line = line.rstrip() # strip out all tailing whitespace

If you want to get rid of ONLY the carriage returns and not any extra whitespaces that might be at the end, you can supply the optional argument to rstrip:

with open(infile, "r") as f:
    for line in f:
        line = line.rstrip('\r\n') # strip out all tailing whitespace

Hope this helps

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
6

Here's how to remove carriage returns without using a temporary file:

with open(file_name, 'r') as file:
    content = file.read()

with open(file_name, 'w', newline='\n') as file:
    file.write(content)
Michael Hays
  • 2,947
  • 3
  • 20
  • 30
4

Python opens files in so-called universal newline mode, so newlines are always \n.

Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program.

You iterate through file line-by-line. And you are replacing \n in the lines. But in fact there are no \n because lines are already separated by \n by iterator and each line contains no \n.

You can just read from file f.read(). And then replace \n in it.

with open(infile, "r") as f:
    content = f.read()
    content = content.replace('\n', ' ')
    #do something with content
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
4

Technically, there is an answer!

with open(filetoread, "rb") as inf:
    with open(filetowrite, "w") as fixed:
        for line in inf:
            fixed.write(line)

The b in open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.

Thanks everyone!

mrcoulson
  • 1,331
  • 6
  • 20
  • 35
3

I've created a code to do it and it works:

end1='C:\...\file1.txt'
end2='C:\...\file2.txt'
with open(end1, "rb") as inf:
     with open(end2, "w") as fixed:
        for line in inf:
            line = line.replace("\n", "")
            line = line.replace("\r", "")
            fixed.write(line)
Ajean
  • 5,528
  • 14
  • 46
  • 69
Raphael
  • 99
  • 10