0

I'm testing in Python if certain string contains something as follows

if substr in str:
  do_something()

The problem is when substr contains letter with diacritics and other non usual characters. How would you recommend to do tests with such letters?

thank you

xralf
  • 3,312
  • 45
  • 129
  • 200

2 Answers2

2

I do not know of any problems specific to diacritics in Python. The following works for me:

 u"ł" in u"źdźbło"
 >>> True

Edit:

u"ł" in u"źdźblo"
>>> False 

The matching is exact. If diacritics-insensitive matching is what you want, specify this in your question and see Fredrik's answer.

Edit2: Yes, for string literals containing non-ascii chars you need to specify the encoding in the source file. Something like this should work:

# coding: utf-8
Rafał Dowgird
  • 43,216
  • 11
  • 77
  • 90
  • Dowgird : And what if I want to test `in title`? can I write `in u"title`? – xralf Sep 20 '11 at 14:22
  • this writes `SyntaxError: Non-ASCII character '\xc5' in file fds on line 67, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details` – xralf Sep 20 '11 at 14:28
0

Use the solution outlined in this SO post to remove all diacritics prior to the testing.

Community
  • 1
  • 1
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130