When does Python raise UnicodeDecodeError when searching in string

Question

I have this piece of code that reads from a gunzip stream and checks if each line contains some pattern. What I have is

if (pattern in line):
    do_something()

Some lines contain non-ASCII characters, when my code reaches those lines, I get a UnicodeDecodeError. However, I am unable to reproduce this error in my manual testing. When I copy the repr of the line that causes UnicodeDecodeError and assign it to variable line and do pattern in line, I get False instead of an error. I am confused about this inconsistency. Why does it behave different for the same string?

Because `repr(some_string)` is not the same as `some_string`, it is a representation of it. For some types it aspires to give a representation that can be used to construct a new instance of the type if given as input to an interpreter or as code, but not for all. See http://stackoverflow.com/questions/7784148/understanding-repr-function-in-python. — Ilja Everilä, Jun 09 '16 at 19:24
Aside: if you're a beginner, you should probably be using Python 3 instead of Python 2. Unicode in particular is handled much better, and while there's still some stuff you have to learn at least what you'll be learning makes sense. — DSM, Jun 09 '16 at 19:56

score 1 · Answer 1 · answered Jun 09 '16 at 19:32

1

I find the root cause of my problem. Somehow, in my actual code pattern has type unicode instead of str, but in manual testing my pattern is just a str that I type in. This causes the different behaviro I observed.

answered Jun 09 '16 at 19:32

user274602

59
1
2
7

When does Python raise UnicodeDecodeError when searching in string

1 Answers1