Handling backslash escapes in filenames read from a file

Question

TL;DR: text file contains strings that represent backslash escapes; how do I use them as input to os.stat() ?

I've an input file input.txt:

./with\backspace
./with\nnewline

Processing them with simple loop doesn't work:

>>> import os
>>> with open('input.txt') as f:
...     for line in f:
...         os.stat(line.strip())
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
FileNotFoundError: [Errno 2] No such file or directory: './with\\backspace'

Using .decode("unicode_escape") as suggested in another question works only partially - the first line in the file fails, the second with \n doesn't.

Sidenote: The input filenames have ./ and I know I can just use os.listdir('.') and iterate over files till I find the right ones. That's not my objective. The objective is processing filenames that contain backslash escapes from a file.

Additional test:

>>> import os
>>> with open('./input.txt') as f:
...     for l in f:
...         os.stat(l.strip().decode('unicode_escape'))
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> with open('./input.txt') as f:
...     for l in f:
...         try:
...             os.stat(l.strip().encode('utf-8').decode('unicode_escape'))
...             print(l.strip())
...         except:
...             pass
... 
os.stat_result(st_mode=33188, st_ino=1053469, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1536468565, st_mtime=1536468565, st_ctime=1536468565)
./with\nnewline

Writing explicit string with os.fsencode() works:

>>> os.stat(os.fsencode('with\x08ackspace'))
os.stat_result(st_mode=33188, st_ino=1053465, st_dev=2049, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1536468565, st_mtime=1536468565, st_ctime=1536468565)

However, with multiple variations on the same command, I still can't read the string from the file such that os.stat() accepts it.

>>> with open('./input.txt') as f:
...      for l in f:
...          os.stat(os.fsdecode( bytes(l.strip(),'utf-8').decode('unicode_escape').encode('latin1') ) )
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
FileNotFoundError: [Errno 2] No such file or directory: './with\x08ackslash'

Please clarify what the correct resulting transformation should be for the two samples. — wallyk, Sep 09 '18 at 03:37
@wallyk I'm not sure what you mean by " correction resulting transformation" phrase, so please clarify. The objective is to pass each string as valid pathname to `os.stat()`. As you can see from two examples, passing string as read from file and with `.decode()` produce a string that cannot be interpreted by `os.stat()`. Files with such filenames exist on my filesystem and I need a Pythonic way of accessing them. Does that help ? — Sergiy Kolodyazhnyy, Sep 09 '18 at 03:42
So is the `\b` a backspace or is it a directory separator follow by a `b`? — Stephen Rauch, Sep 09 '18 at 04:45
@StephenRauch I'm working on Linux, so it's not directory separator. The file is created with `touch$'\b'ackspace'`. So `\b` in the filename is the actual backslash byte. In the `input.txt` that's a string. As I mentioned in the question, I've looked at a related question that suggests using `.decode('unicode_escape')` on a string, however that doesn't work. I'll edit an answer to add more output in a second — Sergiy Kolodyazhnyy, Sep 09 '18 at 04:52
I would like to say that putting a backspace into your filename is an extraordinarily bad idea. — Stephen Rauch, Sep 09 '18 at 04:53
Have you tried: https://stackoverflow.com/questions/1885181/how-do-i-un-escape-a-backslash-escaped-string-in-python ? — Stephen Rauch, Sep 09 '18 at 04:56
@StephenRauch I'm very well aware of that. Not the first day of dealing with filenames on Linux. However, I am trying to anticipate such filenames from another user if they use my scripts. It's the same reasoning as to why in bash we do `while IFS= read -r line; do ..; done < input.txt` We can't assume `input.txt` won't have backslashes in text, or leading whitespace in the line. — Sergiy Kolodyazhnyy, Sep 09 '18 at 04:56
@StephenRauch Yes I have tried the linked suggestion. I actually linked it in my question. The added output at the end of the question also shows the linked suggestion doesn't help. — Sergiy Kolodyazhnyy, Sep 09 '18 at 04:57
At work I have Linux readily available, at home Windows. Wish I could be more helpful.. Good luck. — Stephen Rauch, Sep 09 '18 at 04:58
@StephenRauch No worries. Thanks for attempting to help anyway. — Sergiy Kolodyazhnyy, Sep 09 '18 at 04:59

score 1 · Answer 1 · answered Sep 09 '18 at 05:17

Works in macos:

touch $'with\backspace'
touch $'with\newline'
echo $'./with\\backspace\n./with\\newline' > input.txt
python
>>> import os
>>> with open('./input.txt') as f:
...     for l in f:
...         os.stat(l.strip().decode('unicode_escape'))
posix.stat_result(st_mode=33188, st_ino=8604304962, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=0, st_atime=1536469815, st_mtime=1536469815, st_ctime=1536469815)
posix.stat_result(st_mode=33188, st_ino=8604305024, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=0, st_atime=1536470112, st_mtime=1536470112, st_ctime=1536470112)

That's with Python 2.7.14 on Darwin Kernel Version 17.7.0.

Tested the same with Python 2.7.15. `OSError: [Errno 2] No such file or directory: './with\x08ackslash'` — Sergiy Kolodyazhnyy, Sep 09 '18 at 05:27
Nevermind. I've realized the input file was incorrect. Does work indeed. — Sergiy Kolodyazhnyy, Sep 09 '18 at 06:08

score 0 · Answer 2 · answered Sep 09 '18 at 06:29

I've realized after about 2 hours of going over this that the input file contained ./with\backslash while the actual filename is created via touch with$'\b'ackspace. Thus Health Raftery's answer works, but only for Python 2. In Python 3 you get AttributeError: 'str' object has no attribute 'decode' since string in Python 3 are already a unicode strings.

In the process, I may have found a better approach via os.fsencode() referenced in jfs's answer.

import os

with open('./input.txt') as f:
    for l in f:
        # alternatively one can use 
        # bytes(l.strip(),sys.getdefaultencoding())
        bytes_filename =  bytes(l.strip(), 'utf-8').decode('unicode_escape')
        f_stat = os.stat(os.fsdecode( bytes_filename ) )
        print(l.strip(),f_stat)

Since I use mostly Python 3, this is what I was looking for. However, Health Raftery's answer is nontheless valid, hence +1'ed.

Handling backslash escapes in filenames read from a file

2 Answers2