Today I learned that for open(filename).read()
we cannot expect that the resources bound to the hidden file object are given back immediately, although I observed this on my system. (See the accepted answer of the question Does reading an entire file leave the file handle open?).
The second answer made me resist to roll my own helper function, it told me that pathlib
already offers exactly this function.
But actually, this seems not to be the case. With the following script (test.py
), I get different results:
# The German accent characters are Ä,Ö,Ü,ä,ö,ü, and ß.
from pathlib import Path;
def pathlib_read_text(filename, encoding=None):
return Path(filename, encoding=encoding).read_text()
def mylocal_read_text(filename, encoding=None):
with open(filename, encoding=encoding) as f:
return f.read()
def test(fun):
print(fun+'_read_text:')
print(eval(fun+'_read_text')(__file__, 'utf-8'))
test('pathlib')
test('mylocal')
The output to the Windows console (python test.py
) contains Ã",Ã-,Ão,ä,ö,ü, and ÃY.
in the first block, when I redirect the output into a file, I get the second block wrong (In Notepad++ it's displayed xC4,xD6,xDC,xE4,xF6,FC, and xDF
in White on Black) if the file is treated as utf-8.
Is there anything I overlooked?
I tried to examine the 3.6.3 code, but found no bug so far ...
Edit
The following version reinforces my feeling that it's a bug in pathlib
or in one of the underlying libraries/functions. Maybe it's only a Windows issue, where the default encoding is mostly different from utf-8
. Now it's sufficient to run the test in a console window.
accents = '''
Ä,Ö,Ü,ä,ö,ü,ß
'''
from pathlib import Path;
import codecs
def pathlib_read_text(filename, encoding=None, errors=None):
return Path(filename, encoding=encoding, errors=errors).read_text()
def mylocal_read_text(filename, encoding=None, errors=None):
with open(filename, encoding=encoding, errors=errors) as f:
return f.read()
def space_it(error):
return ' ';
codecs.register_error('space_it', space_it)
def test(fun):
s = eval(fun+'_read_text')(__file__, 'utf-8', errors='space_it')
print(fun+'_read_text:', s.split("\n")[1] == accents.strip())
test('pathlib')
test('mylocal')
It produces the following output:
pathlib_read_text: False
mylocal_read_text: True