1

I have problem with comparing string from file with string I entered in the program, I should get that they are equal but no matter if i use decode('utf-8') I get that they are not equal. Here's the code:

final = open("info", 'r')
exported = open("final",'w')
lines = final.readlines()
for line in lines:
    if line == "Wykształcenie i praca": #error
    print "ok"

and how I save file that I try read:

comm_p = bs4.BeautifulSoup(comm)
comm_f.write(comm_p.prettify().encode('utf-8'))

for string in comm_p.strings:
      #print repr(string).encode('utf-8')
      save = string.encode('utf-8') #  there is how i save
      info.write(save)
      info.write("\n")        

info.close()

and at the top of file I have # -- coding: utf-8 --

Any ideas?

adaniluk
  • 321
  • 1
  • 5
  • 18
  • 1
    add `print "%r %r" % (line, "Wykształcenie i praca")` right before the comparison line and tell us what it says – georg Sep 24 '12 at 07:49

4 Answers4

3

This should do what you need:

# -- coding: utf-8 --
import io

with io.open('info', encoding='utf-8') as final:
    lines = final.readlines()

for line in lines:
    if line.strip() == u"Wykształcenie i praca": #error
        print "ok"

You need to open the file with the right encoding, and since your string is not ascii, you should mark it as unicode.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
0

It is likely the difference is in a '\n' character

readlines doesn't strip '\n' - see Best method for reading newline delimited files in Python and discarding the newlines?

In general it is not a good idea to put a Unicode string in your code, it would be a good idea to read it from a resource file

Community
  • 1
  • 1
Ofir
  • 8,194
  • 2
  • 29
  • 44
  • you're right, it's difficult to notice that small mistake when you think that encoding causes error :P – adaniluk Sep 24 '12 at 07:55
0

First, you need some basic knowledge about encodings. This is a good place to start. You don't have to read everything right now, but try to get as far as you can.

About your current problem:

You're reading a UTF-8 encoded file (probably), but you're reading it as an ASCII file. open() doesn't do any conversion for you.

So what you need to do (at least):

  • use codecs.open("info", "r", encoding="utf-8") to read the file
  • use Unicode strings for comparison: if line.rstrip() == u"Wykształcenie i praca":
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

use unicode for string comparision

>>> s = u'Wykształcenie i praca'
>>> s == u'Wykształcenie i praca'
True
>>>

when it comes to string unicode is the smartest move :)

Anuj
  • 9,222
  • 8
  • 33
  • 30