File size changes after read/write txt file in python

Question

After executing the following code to generate a copy of a text file with Python, the newfile.txt doesn't have the exact same file size as oldfile.txt.

with open('oldfile.txt','r') as a, open('newfile.txt','w') as b:
    content = a.read()
    b.write(content)

While oldfile.txt has e.g. 667 KB, newfile.txt has 681 KB.

Does anyone have an explanation for that?

Is it difficult for you to check two files for differences? There are tools for that. (But my guess is you may find it's End-of-line related.) — Jongware, Mar 14 '18 at 09:31
[When to open file in binary mode (b)?](https://stackoverflow.com/questions/31483253/when-to-open-file-in-binary-mode-b) answers that, with longer explanations of "text" mode than in the answers here. — Jongware, Mar 14 '18 at 09:59
Yes, I generated oldfile.txt in Windows. It's newline characters were `\r\n`, while the newline characters in newfile.txt were `\n`. I see, opening the files in binary mode `with open('oldfile.txt', 'rb') as a, open('newfile.txt', 'wb') as b:`... preserves the newline characters. — Matthias, Apr 03 '19 at 20:56

Giacomo Catenazzi · Accepted Answer · 2018-03-14T10:07:50.833

There are various causes.

You are opening a file as text file, so the bytes of file are interpreted (decoded) into python, and than encoded. So there could be changes.

From open documentation (https://docs.python.org/3/library/functions.html#open):

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.

So if the original file were ASCII (e.g. generated in Windows), you will have the \r removed. But when writing back the file you can have no more the original \r (if you are in Linux or MacOs) or you will have always \r\n, if you are on Windows (which it seems the case, because you file increase in size).

Also encoding could change text. E.g. BOM mark could be removed (or added), and potentially (but AFAIK it is not done implicitly), unneeded codes could be removed (you can have some extra code in Unicode, which change the behaviour of nearby codes. One could add more of one of them, but only the last one is effective.

True, thank you. When I read/write in binary mode, the file content remains unchanged. — Matthias, Mar 14 '18 at 10:17

score 0 · Answer 2 · answered Mar 14 '18 at 09:36

0

I tried on Linux / Ubuntu. It works as expected, the file-size of both files is perfectly equal.

At this point, i guess this behavior does not relate to python, maybe it depends on your filesystem (compression) or operating system.

answered Mar 14 '18 at 09:36

r4r3devAut

90
10

One test is definitive just if it prove that is is wrong. You have not tested all cases. – Giacomo Catenazzi Mar 14 '18 at 09:48

File size changes after read/write txt file in python

2 Answers2