1

I have the following list of strings :

exclude = ['eee', 'iii']

I have a word to be tested :

word = 'Iîïe'

I want the following test to be true :

if any(x in word for x in exclude):
    #I want to be here !

In order to be true, my condition needs to be case-insensitive and accent-insensitive... How?

Vincent
  • 1,534
  • 3
  • 20
  • 42
  • `in` just does substring searching. Therefore it's case and accent sensitive – OneCricketeer Oct 02 '16 at 13:16
  • I have no idea how to do accent insensitivity, you would have to define that further. For case insensitivity `any([x in world.lower() for x in exclude])` would do the job. Are you sure you don't mean `exclude = ['e', 'i']`? – timakro Oct 02 '16 at 13:16
  • See this [answer](http://stackoverflow.com/a/29247821/4099593). – Bhargav Rao Oct 02 '16 at 13:47

1 Answers1

1

You can use a third party package called unidecode:

What Unidecode provides is a middle road: function unidecode() takes Unicode data and tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose.

Example:

from unidecode import unidecode
...
if any(x in unidecode(word).lower() for x in exclude):
    ...
Selcuk
  • 57,004
  • 12
  • 102
  • 110
  • Do you know how resource-intensive is it ? – Vincent Oct 02 '16 at 13:20
  • I have no idea, but you should be able to benchmark it using sample data from your use case. I also suggest you to read the "Performance notes" section in the project page. – Selcuk Oct 02 '16 at 13:22