0

Hy volk,

is there any fastest ways to recognize if given char is emoji?

Till this moment i found the following solution:

import emoji
character in emoji.UNICODE_EMOJI

But it seems to be a not the best one, because to check if given char is in the dict, because you need to compute hash function and make lookup. What I mention, maybe it it possible just to check, if emoji Code point is inside of some range of Unicode code points, which are emojis. Any ideas, how to implement it?

Thx u in advance!

Egor Savin
  • 39
  • 7

1 Answers1

1

If what you are looking for is faster lookups in a list and you don't have any duplicates, you can try replacing list() with a set() instead.

Similar problem: https://stackoverflow.com/a/5993659/7570485

Update:

As far as I know, you can't get any faster lookups than a dict(). Average time complexity for dict lookup is O(1). You could try intern() in sys module to gain a small performance boost.

Source: https://stackoverflow.com/a/40694623/7570485

Xnkr
  • 564
  • 5
  • 16
  • thx u=)) emoji.UNICODE_EMOJI is already an dict, and thats why already has this hash-tables magic... But your right, if it would be an list, than it could help!=) – Egor Savin Aug 17 '18 at 14:53
  • I assumed you had a list as you have mentioned it as checking a `char` in a `list`. Please change it to `dict` to avoid further confusion. – Xnkr Aug 17 '18 at 19:35
  • You're welcome. Please mark the question as answered :) – Xnkr Aug 19 '18 at 16:15
  • done! One question more, do u know, if there any possibility to intern() also unicode string in python 2.7? – Egor Savin Aug 19 '18 at 16:21
  • I believe `intern()` is built-in function in Python 2.7. In Python 3, it has been moved inside `sys.intern()`. If you are looking for providing compatibility for both, you can `try` importing `from sys import intern` on `except` pass and just use `intern` throughout the code. – Xnkr Aug 19 '18 at 16:28
  • yeap, but unfortunately, if i try to intern() an unicode object in python2.7, than i get this error: **TypeError: intern() argument 1 must be string, not unicode.** – Egor Savin Aug 19 '18 at 16:30
  • Yeah, a lot of weird things happen in Python 2.7 when using Unicode. You can try to encode it to another format like utf-8, if it suits you. Or better yet stick to Python 3 :) – Xnkr Aug 19 '18 at 16:34
  • 2
    heh) u right. probably it will be my next step, to upgrade python=) Because the last python version 3.7 seems to be faster as python 2.7! – Egor Savin Aug 19 '18 at 16:42