Say I have the following HTML emoji entity: '😄 ;'
Note there isn't actually a space between the 4 and the ; it's just there so that it doesn't show up as a smiley
The emoji's Python form is: u"\U0001f604"
How do I convert all HTML emoji entities to their Python form?
Things I have tried so far:
- Encode to utf-8
- Unescape the text using HTML Parser and then convert
- Use regex (couldn't get something that worked for all of the HTML emoji entities -- not as simple as swapping &#x with \U000 as that only works for some entities)