Printing out all unicode emojis to file

Question

It's possible to print the hexcode of the emoji with u'\uXXX' pattern in Python, e.g.

>>> print(u'\u231B')
⌛

However, if I have a list of hex code like 231B, just "adding" the string won't work:

>>> print(u'\u' + ' 231B')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

The chr() fails too:

>>> chr('231B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required (got type str)

My first part of the question is given the hexcode, e.g. 231A how do I get the str type of the emoji?

My goal is to getting the list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt and read the hexcode on the first column.

There are cases where it ranges from 231A..231B, the second part of my question is given a hexcode range, how do I iterate through the range to get the emoji str, e.g. 2648..2653, it is possible to do range(2648, 2653+1) but if there's a character in the hexa, e.g. 1F232..1F236, using range() is not possible.

Thanks @amadan for the solutions!!

TL;DR

To get a list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt into a file.

import requests
response = requests.get('https://unicode.org/Public/emoji/13.0/emoji-sequences.txt')

with open('emoji.txt', 'w') as fout:
    for line in response.content.decode('utf8').split('\n'):
        if line.strip() and not line.startswith('#'):
            hexa = line.split(';')[0]
            hexa = hexa.split('..')            
            if len(hexa) == 1:
                ch = ''.join([chr(int(h, 16)) for h in hexa[0].strip().split(' ')])
                print(ch, end='\n', file=fout)
            else:
                start, end = hexa
                for ch in range(int(start, 16), int(end, 16)+1):
                    #ch = ''.join([chr(int(h, 16)) for h in ch.split(' ')])
                    print(chr(ch), end='\n', file=fout)

Amadan · Accepted Answer · 2020-03-09T05:47:56.410

3

Convert hex string to number, then use chr:

chr(int('231B', 16))
# => '⌛'

or directly use a hex literal:

chr(0x231B)

To use a range, again, you need an int, either converted from a string or using a hex literal:

''.join(chr(c) for c in range(0x2648, 0x2654))
# => '♈♉♊♋♌♍♎♏♐♑♒♓'

or

''.join(chr(c) for c in range(int('2648', 16), int('2654', 16)))

(NOTE: you'd get something very different from range(2648, 2654)!)

edited Mar 09 '20 at 05:47

answered Mar 09 '20 at 05:40

Amadan

191,408
23
240
301

Does `int('2654', 16)` include 2654? – alvas Mar 09 '20 at 05:50
It is an integer, it doesn't have a concept of inclusion. `int('2654', 16)` is equal to `0x2654` and 9812, representing the code point `'WHITE CHESS KING'`. If you are asking if `range(0x2648, 0x2654)` includes `0x2654`, then no, it does not, since `range` never includes its endpoint; see [Why does range(start, end) not include end?](https://stackoverflow.com/questions/4504662/why-does-rangestart-end-not-include-end) – Amadan Mar 09 '20 at 05:53
Ah, but the range does include in unicode emoji. so `int('2654', 16) + 1` =) – alvas Mar 09 '20 at 05:54
No, you said `2648..2653`; I already added the needed 1. If you want, you can write `int('2653', 16) + 1`. – Amadan Mar 09 '20 at 05:56
Ah yes, my fault, didn't read the range properly. You're write, it's already +1 =) – alvas Mar 09 '20 at 05:57
1

How about emojis like `00A9 FE0F`? – alvas Mar 09 '20 at 06:08
Those are just two characters next to each other: `chr(0xa9) + chr(0xfe0f)`. – Amadan Mar 09 '20 at 06:10

Printing out all unicode emojis to file

TL;DR

1 Answers1

Linked