UnicodeDecodeError error when opening random file?

Question

I am trying to open a random file in a directory and search for a string. However, I get an error. Is the path I'm using wrong or the way I am trying to read the file wrong?

path = "C:\\Users\\ASDF\\Desktop\\profiles2\\"
random_file = random.choice(os.listdir(path))
filepath = os.path.join(path, random_file)
data = open(filepath).read()
if 'xpression' in data:
    print("true")

return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 9502: character maps to

check this answer: https://stackoverflow.com/a/49562606/10358386 Specifying the encoding helps in most cases — octopathTraveller, Nov 30 '18 at 17:27
Your "any random file" must include some files that are not plain text, encoded as UTF8. By default and without further specification from your side, Python assumes this is the case. — Jongware, Nov 30 '18 at 17:43
You're not closing your handle... I'd suggest using `with open(...` — Maximilian Burszley, Nov 30 '18 at 18:06
Just checking - did you see my answer? Does it help? If not, what is missing? — Jongware, Dec 03 '18 at 09:15

score 1 · Answer 1 · answered Nov 30 '18 at 23:37

First off, your code as provided does not run; you forgot a few necessary import statements.

You get that UnicodeDecodeError because the default encoding for Python text files is UTF-8, and if you select any random file from your computer, it may not be UTF-8 encoded at all – or not even a text file to begin with. At that point the Unicode UTF8 character decoder fails to decode the input.

If you specify the encoding as latin1, then Python assumes a one-to-one encoding of bytes to characters, and it will not try to decode "as if" it's UTF-8 anymore. That takes care of one problem.

After fixing that, another one popped up in my random experimentation: os.listdir returns not only a list of files, but it may also include folders. You could have the program stop with the appropriate error message, but you can also remove the folders from your list before picking one. There are several methods to do so – os.walk, for example – but I found a magic line to get a list of just files out of os.listdir from How do I list all files of a directory?.

The following code works without errors on my system; running it several times after another, once in a while it will say "true" (admittedly, I had to change the test text for that; your original text xpression occurs way too infrequent in my own files to test with).

import random,os

path = "."
random_file = random.choice([f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))])
print (random_file)

filepath = os.path.join(path, random_file)

with open(filepath, encoding='latin1') as file:
    data = file.read()
    if 'test' in data:
        print("true")

This works with the encoding set to latin1 because it treats plain ASCII data as such and doesn't bother with any binary content. However, it will randomly fail or succeed if your search text contains a not-ASCII character such as an accented letter. (It will only succeed when that random file happens to be encoded as Latin-1 as well, but fail if it was UTF-8.)

UnicodeDecodeError error when opening random file?

1 Answers1