1

I have strings from a JSON source that contains emojis and I can't upload these strings properly to my MySQL db because of this reason.

Fortunately I don't really need to save the exact string, so I can use unidecode(mystring) which solves my problem, however I can't recognize which strings should be handled differently.

So my question is that how could I detect the strings with emojis in an if statement and unidecode the ones with emojis?

The best solution would be if I could do something like this:

if emoji in base_string: 
    new_string = unidecode(base_string)
else:
    new_string = base_string 

So far I have tried these solutions without any success:

if "xF0" in base_string: 
    new_string = unidecode(base_string)
else:
    new_string = base_string 

if "U000" in base_string: 
    new_string = unidecode(base_string)
else:
    new_string = base_string 

After reading about the topic I could log the strings with base_string.encode('utf-8') but I still can't figure out how could I check for byte/str matching so I would really aprreciate if somebody could show me the right way.

rihekopo
  • 3,241
  • 4
  • 34
  • 63
  • Is the transformation something that you couldn't just apply to *all* the content? – Joshua Taylor Aug 11 '17 at 19:07
  • @JoshuaTaylor unfortunately I can't do it. Only the 10% contains emojis, but the 90% should be remain the same. – rihekopo Aug 11 '17 at 19:17
  • You may check for this question https://stackoverflow.com/questions/33404752/removing-emojis-from-a-string-in-python – Cedric Zoppolo Aug 11 '17 at 19:58
  • @Cedric, I can remove them too, unidecode(mystring) does it. I did not ask how can remove the emojis. – rihekopo Aug 11 '17 at 20:05
  • Although you did not ask how to remove the emojis, it stands to reason that if you knew how to remove them, you would know how to detect them. Your problem is a subset of the other problem: in order to remove them, one must first detect them. – Cody Gray - on strike Aug 11 '17 at 20:10
  • Don't use any method for this, if python work under `binary mode` all errors will be resolved. Check accepted data for type but "why don't you do it on the user side ?". Unicode or UTF-8 make your choice cos all-time need more filter for reusing saved data ! – dsgdfg Aug 11 '17 at 20:47

0 Answers0