I need to convert (in Python) a 4-byte char into some other character. This is to insert it into my utf-8 mysql database without getting an error such as: "Incorrect string value: '\xF0\x9F\x94\x8E' for column 'line' at row 1"
Warning raised by inserting 4-byte unicode to mysql shows to do it this way:
>>> import re
>>> highpoints = re.compile(u'[\U00010000-\U0010ffff]')
>>> example = u'Some example text with a sleepy face: \U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '
However, I get the same error as the user in the comment, "...bad character range.." This is apparently because my Python is a UCS-2 (not UCS-4) build. But then I am not clear on what to do instead?