Python: line termination while reading and writing files

Question

I've written a simple utility which removes certain lines from a file. It reads a file into a list of lines (Python), and writes them back as a single concatenated string, new lines preserved; some lines are dropped, some get commented in the process; the percentage of change is negligible. Somehow diff presents me with a big red block before, and a big green block after. To a naked eye the resulting file looks quite good; I thought about some subtle difference with the tailing spaces or something like that, but is it really possible? Had I added something invisible to each line, every red line would have been followed by the corresponding green one. Or so I gather.

UPD:

Well, line endings is a certainty, I was told. The essentials of my code:

def check_file(path):
    out_line = ""
    with open(path, "r") as f_r:
        for line in f_r.readlines():
            drop_it, o_line = consume_line(line)
            if drop_it:
                pass
            else:
                out_line += o_line
    with open(path, "w") as f_w:
        f_w.write(out_line)

consume_line() essentially returns its argument as is. It may be either scheduled for dropping, or uncommented/commented out, C++ style, in certain infrequent cases. No manual fiddling with line endings in any case.

No editor reports any change in the total number of lines if no line is dropped. The files originated and handled on Linux.

It shows me a lot of `~` in a column. New lines, apparently. — Alexey Orlov, Mar 25 '18 at 17:51
Possible duplicate of [diff returning entire file for identical files](https://stackoverflow.com/questions/12876350/diff-returning-entire-file-for-identical-files) — Biffen, Mar 25 '18 at 18:03
Git keeps a copy of each file that is committed, stashed and deleted in the `.git` folder. Then when you make a change to a file, it compares the last committed version of that file in the `.git` folder to the same file in the parent folder (root folder). If the line numbers and or content on that line differs, it assumes you've changed it. That's why sometimes Git think you've changed half of the file when you add a new line in the middle of the file. — Prav, Mar 25 '18 at 18:29
As @Biffen told you, that is a problem of end of line handleing. — Philippe, Mar 25 '18 at 19:21
@AlexeyOrlov You read `line` but append `o_line`. Are you sure they are the same? — phd, Mar 26 '18 at 03:03
They are *mostly* the same. `consume_line()` may change the head of the line, but leave the line ending alone *always*. — Alexey Orlov, Mar 26 '18 at 03:44
I did some research; with the Universal Newline Support it should be a piece of cake, but it is not. — Alexey Orlov, Mar 26 '18 at 06:00
Yes indeed! The original file got `\r\n` endings. They get replaced with `\n`. With the Universal Newline Support this should not happen. Or should it? — Alexey Orlov, Mar 26 '18 at 06:07
It should. Physical line ending in files could be whatever but logical line ending after reading is always `\n`. `\n` must be replaced with `\r\n` on writing back to file. — phd, Mar 26 '18 at 15:03

Alexey Orlov · Accepted Answer · 2018-03-27T05:19:04.540

This code sets all the line endings after the first eol in the file. Trailing spaces preserved.

g_newline = ''


def current_nl(f):
    global g_newline

    if g_newline:
        return g_newline
    g_newline = f.newlines
    if isinstance(g_newline, tuple):
        print('!!! Something wrong. This is supposed to be the first eol in the file\n')
        return '\n'
    return g_newline


def check_file(path):
    global g_newline

    g_newline = ''
    out_line = ""

    with open(path, "r") as f_r:
        for line in f_r.readlines():
            drop_it, o_line = consume_line(line)
            if drop_it:
                pass
            else:
                out_line += o_line.rstrip('\n') + current_nl(f_r)
    with open(path, "w") as f_w:
        f_w.write(out_line)

Python: line termination while reading and writing files

1 Answers1