First off, your code as provided does not run; you forgot a few necessary import
statements.
You get that UnicodeDecodeError because the default encoding for Python text files is UTF-8, and if you select any random file from your computer, it may not be UTF-8 encoded at all – or not even a text file to begin with. At that point the Unicode UTF8 character decoder fails to decode the input.
If you specify the encoding as latin1
, then Python assumes a one-to-one encoding of bytes to characters, and it will not try to decode "as if" it's UTF-8 anymore. That takes care of one problem.
After fixing that, another one popped up in my random experimentation: os.listdir
returns not only a list of files, but it may also include folders. You could have the program stop with the appropriate error message, but you can also remove the folders from your list before picking one. There are several methods to do so – os.walk
, for example – but I found a magic line to get a list of just files out of os.listdir
from How do I list all files of a directory?.
The following code works without errors on my system; running it several times after another, once in a while it will say "true" (admittedly, I had to change the test text for that; your original text xpression
occurs way too infrequent in my own files to test with).
import random,os
path = "."
random_file = random.choice([f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))])
print (random_file)
filepath = os.path.join(path, random_file)
with open(filepath, encoding='latin1') as file:
data = file.read()
if 'test' in data:
print("true")
This works with the encoding set to latin1
because it treats plain ASCII data as such and doesn't bother with any binary content. However, it will randomly fail or succeed if your search text contains a not-ASCII character such as an accented letter. (It will only succeed when that random file happens to be encoded as Latin-1 as well, but fail if it was UTF-8.)