1

I have a json {u'nickname':u'\U0001f638\U0001f638\u5bb6\u52c7'}. When save nickname to db , it raise:

DatabaseError: (1366, "Incorrect string value: '\\xF0\\x9F\\x98\\xB8\\xF0\\x9F..
.' for column 'nickname' at row 1")

I think \U0001f638\U0001f638 is the problem, they are some kinds of image code.But how to detect such string and remove them?

Mithril
  • 12,947
  • 18
  • 102
  • 153

1 Answers1

6

I find the answer here. Emoji infomation:http://punchdrunker.github.io/iOSEmoji/table_html/index.html

\U0001f638 is IOS Emoji characters. use Martijn Pieters's code:

try:
    highpoints = re.compile(u'[\U00010000-\U0010ffff]')
except re.error:
    # UCS-2 build
    highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')

>>> import re
>>> highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')
>>> example = u'\U0001f638\U0001f638\u5bb6\u52c7'
>>> highpoints.sub(u'', example)
u'\u5bb6\u52c7'

It works!

Community
  • 1
  • 1
Mithril
  • 12,947
  • 18
  • 102
  • 153