Okay here is little bit dirty way, but maybe it will help you find better solution:
Let's suppose that we have
string = " 123"
Where
Javascript output is: string[3]
→ 1
Python output is: string[3]
→ 2
Why it happens?
Python determining emoji like one character, but Javascript like two.
Let's see how this string looking in Javascript in escaped form:
import json
print(json.dumps(string).strip('"'))
And output will be:
# raw string will be looks like '\\ud83d\\udcd9 123'. \\ (escaped \) means that \u is not a UTF character but usual string starting with \u
\ud83d\udcd9 123
If you will try to paste this line into browser's console you will get emoji.
So if we replace \u1234
with X
for example, the string length will be same as Javascript counting.
Let's do it with regex:
import json
import re
new_string = re.sub('\\\\u[0-9a-f].{3}', 'X', json.dumps(string).strip('"'))
print(new_string)
And output will be XX 123
, aaand voila new_string[3]
will be 1
. Same as Javascript.
But be carefull, this solution replace all UTF-8 bytes to X
. Only ASCII characters may be parsed by this way.
Some info that may help you: 1, 2, 3
If you able to edit Javascript side, I recommend to use var chars = Array.from(string)
. That will allow to generate correct sequence of characters: [ "", " ", "1", "2", "3" ]