49

I've a little problem.

I'm using NodeJS as backend. Now, an user has a field "biography", where the user can write something about himself.

Suppose that this field has 220 maxlength, and suppose this as input:

‍♀️‍♀️‍♀️‍♀️‍♀️‍♀️‍⚕️‍⚕️‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ 

As you can see there aren't 220 emojis (there are 37 emojis), but if I do in my nodejs server

console.log(bio.length)

where bio is the input text, I got 221. How could I "parse" the string input to get the correct length? Is it a problem about unicode?

SOLVED

I used this library: https://github.com/orling/grapheme-splitter

I tried that:

var Grapheme = require('grapheme-splitter');
var splitter = new Grapheme();
console.log(splitter.splitGraphemes(bio).length);

and the length is 37. It works very well!

Stackedo
  • 803
  • 2
  • 8
  • 19
  • https://blog.jonnew.com/posts/poo-dot-length-equals-two - Is a good explaination. – Luke Jan 25 '19 at 16:53
  • Possible duplicate of [Counting unicode characters in Javascript](https://stackoverflow.com/questions/25292855/counting-unicode-characters-in-javascript) – Heretic Monkey Jan 25 '19 at 16:54
  • ```Buffer.from(string).length``` – alphapilgrim Jan 25 '19 at 16:58
  • 2
    The `Buffer.from(string).length` returns 481 – Stackedo Jan 25 '19 at 17:02
  • 1
    One thing is to consider what is important: storage allocation in bytes or the number of characters that can be entered. Remember even accented characters can exist in combined and non-combined form. Think three bytes vs two bytes. Also, is text stored at UTF8 or UTF16? This post is a good read: https://blog.jonnew.com/posts/poo-dot-length-equals-two – Andre M Jan 25 '19 at 17:04
  • I'm using MongoDB as database, so it is in UTF8.. – Stackedo Jan 25 '19 at 17:06
  • I've tried grapheme-splitter `new GraphemeSplitter().countGraphemes('‍‍‍‍️‍');` results in `4`. So not entirely reliable. – Ste Apr 23 '21 at 09:52

9 Answers9

54
  1. str.length gives the count of UTF-16 units.

  2. Unicode-proof way to get string length in codepoints (in characters) is [...str].length as iterable protocol splits the string to codepoints.

  3. If we need the length in graphemes (grapheme clusters), we have these native ways:

    a. Unicode property escapes in RegExp. See for example: Unicode-aware version of \w or Matching emoji.

    b. Intl.Segmenter — coming soon, probably in ES2021. Can be tested with a flag in the last V8 versions (realization was synced with the last spec in V8 86). Unflagged (shipped) in V8 87.

See also:

vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • 6
    does destructing the example string return 37 though? 130 for me. – Alex K. Jan 25 '19 at 16:53
  • 3
    Well, then we need to define in what units the max length should be. We have 221 UTF-16 units, 131 Unicode points (characters) or 37 combined graphemes. – vsemozhebuty Jan 25 '19 at 17:01
  • 5
    The question is pretty asking for code that would output 37, rather than 130. `[...str].length` is incorrect for counting emojis as a single unit. You might want to clarify this in your answer so that you don't cause people unnecessary trouble. – joe Jul 28 '20 at 13:30
  • 1
    none of these gives correct answer for `‍‍‍` – zxch3n Nov 16 '21 at 08:39
  • 7
    @R3m `[...new Intl.Segmenter().segment('‍‍‍')].length` gives `1` (grapheme) if you need `1`. – vsemozhebuty Nov 16 '21 at 19:55
  • @R3m Also, in some cases, this proposal will help: https://github.com/tc39/proposal-regexp-unicode-sequence-properties – vsemozhebuty Nov 16 '21 at 19:58
  • your second "unicode proof" does not work with , it returns a length of `2` – Antoine Weber Sep 24 '22 at 08:44
  • @AntoineWeber That is because there are 2 Unicode characters there: `[...''].map(char => char.codePointAt(0).toString(16))` gives `['1f1ef', '1f1f5']`. You need the third way here: `[...new Intl.Segmenter().segment('')].length` is `1`. – vsemozhebuty Sep 24 '22 at 12:31
5

TL;DR there are solutions, but they don’t work in every case. Unicode can feel like a dark art.

There seems to be limitations in various solutions I have seen presented, with the issue going beyond emojis and covering other characters in the Unicode range. Consider é can be stored as é or e + ‘, if using combing characters. This can even lead to two strings that look the same not being equal. Also note, in certain cases a single emoji can be 11 characters when stored and as a result 22 bytes, assuming UTF16.

The way this is handled and how characters are combined, or displayed, can even vary between browsers and operating systems. So, while you may think you cracked it, there is a risk another environment breaks this. Be sure to test where it matters.

Now, there is the front-end vs back-end problem: you solved the character count problem so it works well for human users, now your single emoji blows right past the allocated field size in the database. Less of an issue with databases such as mongo, but can be one with SQL databases, where field allocation was conservative. This means how you solve your problem will depend where the hardest limitation comes in.

Note, that a basic solution does involve converting a string to an array and getting the length, accepting limitations:

Array.from(str)

This will fall apart when characters are combined and dealing with astral planes.

A few high level approaches, that take into account limitations:

  • use approaches that solve the front-end issue, as best as possible, and then ensure storage issues are resolved
  • be more conservative with the advertised front-end limits, if the database or other storage can’t be adjusted
  • limit the character types that can be entered
  • clearly indicate limitations of the length calculation

Additionally, given the complexity of the issue it may be worth seeing if there is a popular JS library that already deals with this? I did not find one at the time of writing. Hopefully this is something that would become core to Javascript at some point.

Other pages to read:

Andre M
  • 6,649
  • 7
  • 52
  • 93
4

I answered to a similar question here

But basically, here it is :

''.match(/./gu).length == 1

As :

''.length == 2

More precision in my original post

CreaZyp154
  • 409
  • 7
  • 11
2
function fancyCount2(str){
  const joiner = "\u{200D}";
  const split = str.split(joiner);
  let count = 0;

  for(const s of split){
    //removing the variation selectors
    const num = Array.from(s.split(/[\ufe00-\ufe0f]/).join("")).length;
    count += num;
  }

  //assuming the joiners are used appropriately
  return count / split.length;
}
Laion Camargo
  • 145
  • 1
  • 11
  • 4
    It would have been nice to credit the post where you got that from: https://blog.jonnew.com/posts/poo-dot-length-equals-two The post explains how it works and notes that there are cases where it doesn't work. – joe Jul 28 '20 at 13:33
  • I swear I didn't read that article before, I just thought of using regex myself, but still a must read article – CreaZyp154 Jan 25 '21 at 14:23
1

With a regex that can parse emojis, this can be done easily and without the use of external libraries. Please see the code snippets for examples. ‍♀️

Note that grapheme-splitter as suggested in the question will overcount and split apart compound emojis that contain other emojis, such as this one: ‍‍. This is reported as three distinct "graphemes", ‍ and ‍ and

Here we are using the 'compact', literal version so it'll fit, but there's a safe, long version that uses Unicode escapes as well.

For more info on the regex see also this answer.

/*the pattern (compact version)*/
var emojiPattern = String.raw`(?:‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍||||‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍|‍❤️‍‍|‍❤️‍‍|‍❤️‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍‍|‍‍|‍❤️‍|‍❤️‍|‍❤️‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|‍‍|️‍️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍⚕️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍⚖️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍✈️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍♂️|‍♂️|‍♂️|‍♂️|‍♂️|‍♀️|‍♀️|‍♀️|‍♀️|‍♀️|‍️|️‍♂️|️‍♀️|️‍♂️|️‍♀️|️‍♂️|️‍♀️|️‍|️‍⚧️|⛹‍♂️|⛹‍♂️|⛹‍♂️|⛹‍♂️|⛹‍♂️|⛹‍♀️|⛹‍♀️|⛹‍♀️|⛹‍♀️|⛹‍♀️|‍|‍|❤️‍|❤️‍|‍♂️|‍♀️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♀️|‍♂️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍⚕️|‍⚕️|‍⚕️|‍|‍|‍|‍|‍|‍|‍⚖️|‍⚖️|‍⚖️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍✈️|‍✈️|‍✈️|‍|‍|‍|‍|‍|‍|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍|‍|‍|‍|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍|‍|‍|‍|‍|‍|‍|‍|‍|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|⛹️‍♂️|⛹️‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍♂️|‍♀️|‍|‍|‍|‍|‍|‍❄️|‍☠️|‍⬛|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||#️⃣|0️⃣|1️⃣|2️⃣|3️⃣|4️⃣|5️⃣|6️⃣|7️⃣|8️⃣|9️⃣|✋|✋|✋|✋|✋|✌|✌|✌|✌|✌|☝|☝|☝|☝|☝|✊|✊|✊|✊|✊|✍|✍|✍|✍|✍|⛹|⛹|⛹|⛹|⛹||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||☺|☹|☠|❣|❤|✋|✌|☝|✊|✍|⛷|⛹|☘|☕|⛰|⛪|⛩|⛲|⛺|♨|⛽|⚓|⛵|⛴|✈|⌛|⏳|⌚|⏰|⏱|⏲|☀|⭐|☁|⛅|⛈|☂|☔|⛱|⚡|❄|☃|⛄|☄|✨|⚽|⚾|⛳|⛸|♠|♥|♦|♣|♟|⛑|☎|⌨|✉|✏|✒|✂|⛏|⚒|⚔|⚙|⚖|⛓|⚗|⚰|⚱|♿|⚠|⛔|☢|☣|⬆|↗|➡|↘|⬇|↙|⬅|↖|↕|↔|↩|↪|⤴|⤵|⚛|✡|☸|☯|✝|☦|☪|☮|♈|♉|♊|♋|♌|♍|♎|♏|♐|♑|♒|♓|⛎|▶|⏩|⏭|⏯|◀|⏪|⏮|⏫|⏬|⏸|⏹|⏺|⏏|♀|♂|⚧|✖|➕|➖|➗|♾|‼|⁉|❓|❔|❕|❗|〰|⚕|♻|⚜|⭕|✅|☑|✔|❌|❎|➰|➿|〽|✳|✴|❇|©|®|™|ℹ|Ⓜ|㊗|㊙|⚫|⚪|⬛|⬜|◼|◻|◾|◽|▪|▫)`

/*compile the pattern string into a regex*/
let emoRegex = new RegExp(emojiPattern, "g");

/*count of emojis*/
let emoCount = [..."‍♀️‍♀️‍♀️‍♀️‍♀️‍♀️‍⚕️‍⚕️‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍".matchAll(emoRegex)].length

console.log(emoCount) //37

/*modifying the pattern to count other characters too*/
let generalCounter = new RegExp(emojiPattern+"|.", "g") //emoji or regular character
let allCount = [..."$%^ other stuff equalling 28‍♀️‍♀️‍♀️‍♀️‍♀️‍♀️‍⚕️‍⚕️‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍".matchAll(generalCounter)].length

console.log(allCount) //28+37 = 65
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
0

As you can see from the below example, this is to do with unicode encoding,

There's some great resources such as the one I took this example from.

https://blog.jonnew.com/posts/poo-dot-length-equals-two

console.log("‍❤️‍‍".length === 11);
Luke
  • 3,481
  • 6
  • 39
  • 63
  • I readed that and tested the fancycount function too, it didn't worked for me..I readed also about fancycount2 version. – Stackedo Jan 25 '19 at 17:01
0

For anyone interested, I had a similar problem where I wanted to count the length of an emoji at the end of a string.

This is the solution I came up with:

var emoji = new RegExp('(\\p{Extended_Pictographic})((\u200D\\p{Extended_Pictographic})*)$', 'u');

var testStrings = ['‍‍', '', ''];

for(var string = 0; string < testStrings.length; string++){

  var match = testStrings[string].match(emoji);
  var chars = match == null ? 0 : match[0].length;
  
  console.log(testStrings[string] + ': ' + chars);
  
}

Explanation: \\p{Extended_Pictographic} matches an emoji like , consisting of two characters. Emojis like ‍‍ consists of 4 emojis (, , ,) combined by a zero width joiner (\u200D).

The regex matches any emoji at the end ($). If there is a match the length is counted. I am sure it could be adopted for your use-case by matching all emojis in a given string and then subtracting the surplus. It's not a complete implementation for your particular question but I hope this gets you on the right track.

Ood
  • 1,445
  • 4
  • 23
  • 43
0

use lodash toArray method

console.log(_.toArray("‍‍").length); // 1
console.log(_.toArray("‍‍‍♂️‍‍").length); // 3

Check here for Codesandbox

Anjan Talatam
  • 2,212
  • 1
  • 12
  • 26
-1

I suggest using the runes package to accomplish correct multi-byte string conversions cause else you will get more issues if using reducers and more to reverse strings for example.

Take a look at this great small package: runes

Bojoer
  • 898
  • 9
  • 19
  • Look very promising however the latest update of this package was 4 years ago (2 Oct 2017) – Kosmonaft May 16 '21 at 23:25
  • Yeah I know it's been 4 year that a new version has been released but it still works as supposed and no peer dependency depreciation so for now I still use it for multi-byte string conversions instead of loading it into a Buffer with the correct encoding. – Bojoer May 17 '21 at 00:45
  • Fair enough. Instead of using additional package (runes or emoji-regex) I used lodash (as it's already included in my project). I came to this solution text to this answer on [stackoverflow](https://stackoverflow.com/a/64138318/1732989) – Kosmonaft May 17 '21 at 03:41
  • Great and indeed you can use lodash also if it's already included and you also use other lodash methods because else I think that lodash is overkill and with many packages I don't want to have too much bloatware. But lodash is amazing. I also used momentjs for datetime manipulation but for many smaller projects that don't really need localization etc....I simply use the native ES6+ methods and I did that with a lot of packages that were actually just easy rather than needed and yes performance increased with 20% so that is significant. – Bojoer May 17 '21 at 22:37