1

I have this piece of code that reads from a gunzip stream and checks if each line contains some pattern. What I have is

if (pattern in line):
    do_something()

Some lines contain non-ASCII characters, when my code reaches those lines, I get a UnicodeDecodeError. However, I am unable to reproduce this error in my manual testing. When I copy the repr of the line that causes UnicodeDecodeError and assign it to variable line and do pattern in line, I get False instead of an error. I am confused about this inconsistency. Why does it behave different for the same string?

user274602
  • 59
  • 1
  • 2
  • 7
  • 1
    Because `repr(some_string)` is not the same as `some_string`, it is a representation of it. For some types it aspires to give a representation that can be used to construct a new instance of the type if given as input to an interpreter or as code, but not for all. See http://stackoverflow.com/questions/7784148/understanding-repr-function-in-python. – Ilja Everilä Jun 09 '16 at 19:24
  • 1
    Aside: if you're a beginner, you should probably be using Python 3 instead of Python 2. Unicode in particular is handled much better, and while there's still some stuff you have to learn at least what you'll be learning makes sense. – DSM Jun 09 '16 at 19:56

1 Answers1

1

I find the root cause of my problem. Somehow, in my actual code pattern has type unicode instead of str, but in manual testing my pattern is just a str that I type in. This causes the different behaviro I observed.

user274602
  • 59
  • 1
  • 2
  • 7