I have a database and a fresh unicode input.
if new_field != old_field:
update_db(new_field)
In PyCharm both appear identical, even when I hover and expand the "view" box I copy paste them into notepad and they are identical, eg:
u"<li>Laundromatic doofer drier £200 on collection</li>"
u"<li>Laundromatic doofer drier £200 on collection</li>"
What is causing the miscomp is the underlying encoding of the pound sign (why can't it have a pythonic single way?). They are both unicode; type(new_field) is unicode
.
I got so frustrated by this that I broke each field (a load of sales blurb) down as so:
>set(old_field.split()) ^ set(new_field.split())
u'£200' # from new_field
u'£200' # from old_field
Is there a better way to compare unicode in python (I'm using 2.7)? i.e. something more universal than
if new_field != old_field.replace(u"£", u"\xa3")
The new field came from the web then was passed to bleach.clean
where I had to pass it to .encode("utf-8")
because it was apparently producing (sqlite3) illegal characters to represent nbsp. The old field has been fetched from sqlite before passing it through bleach.clean
(as an afterthought), which did not require .encode("utf-8")
, since sqlite only stores unicode.