3

The unicode string in question:

ਪਹਿਲਾਂ ਲੋਕਾਂ ਦੇ ਦਿਲਾਂ ਦਿਮਾਗ਼ਾਂ ਚੋਂ ਇਹ ਵਹਿਮ ਕੱਢੋ ਕਿ 
ਅਸੀਂ ਹਿੰਦੂ ਹਾਂ,
ਅਸੀਂ ਸਿੱਖ ਹਾਂ,
ਅਸੀਂ ਮੁਸਲਮਾਨ ਹਾਂ,
ਅਸਲੀਅਤ ਇਹ ਹੈ ਕਿ 
ਅਸੀਂ ਭੁੱਖੇ ਹਾਂ, 
ਅਸੀਂ ਬੇਰੁਜ਼ਗਾਰ ਹਾਂ, 
ਅਸੀਂ ਨਸ਼ੇੜੀ ਹਾਂ, 
ਅਸੀਂ ਲਾਚਾਰ ਹਾਂ, 
ਅਸੀਂ ਬੇਵਕੂਫ਼ ਹਾਂ, 
ਅਸੀਂ ਬੀਮਾਰ ਹਾਂ, 

Language: Punjabi
Format: Unicode

Problem:
Javascript reports its str.length() = 226; whereas WhatsApp says 700 Characters.

Javascript Code:

console.log(inputStr.length);

Whatsapp Screenshot 01:
Fine, no error: (count at 698 or 699)

Whatsapp Screenshot 01

Just adding a new line from mobile keyboard makes it over 700 Whatsapp Screenshot 02

  1. Why there is a big mismatch in string length?
  2. Which number is real?
  3. How can I get the similar result in Javascript same as of Whatsapp result?
DavChana
  • 1,944
  • 17
  • 34

2 Answers2

1
  1. This seems to be a bug with Whatsapp. If you try inputting text with newlines, you will find that on the 15th line it will give the error of exceeding 700 characters, no matter what the current character count is.

    If you try to put your string without the newline character it will work and not show the 700 character exceeded message (as shown in the image attached below)

  2. The correct length is the javascript string length which is 226.

Your text without newlines

Community
  • 1
  • 1
willi123yao
  • 144
  • 4
  • You are correct about the 15th new line with nothing else gives that "700 chars" error in Whatsapp. – DavChana Sep 04 '19 at 04:28
1

That's probably because of the way 'length' is calculated. Punjabi, like any other 'foreign' language, is based on the unicode. And unicode doesn't have a fixed length in which the character is stored. The unicode character takes length anywhere from 1 byte to 4 bytes. So, for example, the character 'ਕਿ' appears to be single but it may take 4 bytes.

Check out this post for more details.

Sukhi
  • 13,261
  • 7
  • 36
  • 53
  • You are right, but just to mention that actually `ਕਿ` is/are two letters or specifically one vowel & one letter. In Punjabi AEIOU & other Vowels are not letters; just extra symbols on regular constant letters. So `ਕਿ` is Ki & `ਕ' is K – DavChana Sep 04 '19 at 04:59
  • Totally understood (especially coming from Marathi background). However, that's not how Unicode works. Check out [this](https://mothereff.in/byte-counter) link which calculates unicode length. The character ਕਿ has two characters but takes 6 bytes. – Sukhi Sep 04 '19 at 05:02