0

I know trying to deal with emojis in a string can get complicated in some cases, but in case I only wanted to know if a string starts with an emoji (any) or not, is there an easy way?

Sorry if it finally turned to be a very basic question, but the only thing that search reports to me for now is this C# one: How can i tell a string starts with an emoji and get the first emoji in the string, without using regex? that I don't even know how to take... Thank you.

Rai
  • 314
  • 1
  • 2
  • 9

1 Answers1

1

To simplify the solution, let's assume all emojies are belong to two Unicode ranges:
U+2600...U+27FF and U+01F000...U+01F6FF

-- returns true when UTF8-string starts with an emoji
function starts_with_emoji(s)
    if s:find"^\xF0\x9F[\x8C-\x9B]" or s:find"^\xE2[\x98-\x9F]" then
        return true
    end
end
ESkri
  • 1,461
  • 1
  • 1
  • 8
  • 1
    could shorten this to `not not (s:find"^\xF0\x9F[\x8C-\x9B]" or s:find"^\xE2[\x98-\x9F]")`, which would also guarantee a boolean as output – Luatic Mar 08 '23 at 16:44
  • So it has tuned out to be as tricky as I was afraid or even more… I thought the fact of limiting the search to the start of the string could somehow ease things or make it more reliable at least, but if it's going to depend on the type of emoji in order to work, I'm not sure it's the way to go for the use I had in mind… Anyway, and as you point, your code seems to work perfectly as long as you don't use kind of _composed_ emojis like e.g. , so, if no other approaches appear, I'll accept the answer assuming trying to cover all the ranges is simply not a feasible option… Well, thanks to both! – Rai Mar 08 '23 at 20:02
  • `cover all the ranges is simply not a feasible option` - it is feasible, unicode description table (info about every symbol) is available, it is long enough, but it is pretty real to copy it to your Lua library. – ESkri Mar 09 '23 at 00:00
  • Your emoji is `U+1F926`, so change the range to `...U+01FFFF` instead of `...U+01F6FF` – ESkri Mar 09 '23 at 00:03
  • Oh, I'll try that, thanks! But to be sure, as much as I'd try to cover all the known range, exist the risk of ranges being extended as new emojis appear, isn't it? So I've been wondering if couldn't be a more general approach.. What if I'd simply try to get the length of 1st char e.g. `#(" Name"):match"^.?[\128-\191]*"` and, since as far as I know emojis are (always?) _composed_ characters, if it returns more than 1 it means it's an emoji or special char? It's just an example and it'd be range dependent too, but hopefully there's a better similar way and I just wanted to expose it for now.. – Rai Mar 09 '23 at 16:40
  • What means "composed"? Emoji is a regular Unicode symbol. Lua can determine the number of bytes in the first UTF-8 char, but how to distinguish emojies from China/Japan characters? – ESkri Mar 09 '23 at 23:42
  • Yeah, sorry, I admit I should learn more about all that... I said "composed" cause I saw emojis like some kind of special characters formed from the combination of other simpler ones as a deduction in base of the fact they count as several characters by Lua. But now I see how such deduction may be totally wrong (as I'm afraid for you words). The case is I can't know beforehand what kind of emoji the string is going to start with, so how could I be prepared for any kind of emoji, whether it's from China/Japan or wherever? The only I need to know is if the first one is an emoji/special char... – Rai Mar 10 '23 at 01:55