0

Sorry if the question is silly. I need to count real number of chars in a string. For example I have the following string:

हा य

With .length I have the following result:

let chars = "हा य";
console.log(chars.length);

for(var i = 0; i < chars.length; i++){
 console.log(chars[i]);
}

enter image description here

As you can see, it's wrong. I have three chars here:

This one - हा, - this one (empty space) and this one. Which is proper and shortest way to count it correctly?

Scott Marcus
  • 64,069
  • 6
  • 49
  • 71
Aleksej_Shherbak
  • 2,757
  • 5
  • 34
  • 71
  • this character is hidden, "ह" – Jovylle Jun 18 '20 at 16:58
  • `"हा"` consists of two characters: `"ह"` and `"ा"`. – alani Jun 18 '20 at 16:59
  • 1
    @JovylleBermudez No. The first character is being treated as two by breaking it into separate parts. – Scott Marcus Jun 18 '20 at 16:59
  • 2
    https://coolaj86.com/articles/how-to-count-unicode-characters-in-javascript/ – epascarello Jun 18 '20 at 17:01
  • Its two chars right? – Ashik Paul Jun 18 '20 at 17:03
  • @AshikPaul There's a space between them. – Barmar Jun 18 '20 at 17:03
  • 1
    @palaѕн Why do you mean? `हा` - one, ``, - two, `य` - three. Am I wrong? – Aleksej_Shherbak Jun 18 '20 at 17:04
  • Is it what you are looking for ??? https://stackoverflow.com/questions/26689852/how-to-break-hindi-string-in-array-with-php-and-count-how-many-letter-and-vowel – Kunal Vohra Jun 18 '20 at 17:04
  • @Aleksej_Shherbak, no `हा` is two. The other two are one each. – alani Jun 18 '20 at 17:05
  • 1
    So, you mean `ह` & `हा` means same thing? – palaѕн Jun 18 '20 at 17:05
  • 2
    I don't know about Hindi alphabet, but I am guessing that in order to avoid a large number of different characters for many combinations, the `ा'` is a separate modifier character. Comper with the Unicode for phonetic alphabet symbols - see https://ipa.typeit.org/full/ and try typing a letter then using one of the modifiers on the bottom row of symbols, and then press backspace to remove them one at a time. – alani Jun 18 '20 at 17:09
  • I just don't know hindi. Is there some language rule, when `हा` is two chars? – Aleksej_Shherbak Jun 18 '20 at 17:10
  • I don't think the modifier counts as a separate character, any more than e.g. `é` counts as two characters even though it consists of character `e` and a modifier `´`. – Guy Incognito Jun 18 '20 at 17:11
  • @GuyIncognito The number of possible accented letters in European languages (combinations of letter plus accent) is not that many, so it is feasible to have separate characters for them all. I am guessing that there is a bigger matrix of possible combinations in this case, although speculation because I don't know any Hindi. – alani Jun 18 '20 at 17:14
  • @Aleksej_Shherbak So I guess that from a practical point of view, if what you mean by "real length" is such that `हा` ought to count as one, then you would have to find out what modifier characters exist, and exclude them from the count. – alani Jun 18 '20 at 17:20
  • Thank you for discussion!!! Could someone please tell me, is this this symbol`ा`exists only for Hindi or not? – Aleksej_Shherbak Jun 18 '20 at 17:21
  • @Aleksej_Shherbak You would need an expert in Asian languages. The Devanagari script is used for many languages in addition to just Hindi. – alani Jun 18 '20 at 17:22
  • Actually, I can count letters in the string without counting this symbol `ा. But I'm not sure will it work for other languages. – Aleksej_Shherbak Jun 18 '20 at 17:24
  • Well I am guessing that if it could ever be meaningfully used standalone, then fonts would not have been designed in such a way that if you do so then you get a circle displayed next to it (clearly in order to represent where the character being modified would be). So I would guess that it is only ever used in combination with whatever it is attached to. But there is more information at https://hindilanguage.info/lessons/lesson-2-devanagari-vowels/ - see the bit about maatraa forms. – alani Jun 18 '20 at 17:31
  • So it seems likely that without a preceding consonant `आ` would be used instead, in any language which uses that script. This based on the web page that I just linked. – alani Jun 18 '20 at 17:32
  • You could also argue that as `हा` actually represents two sounds (consonant followed by vowel) it is reasonable to count it as two in any case. – alani Jun 18 '20 at 17:34

1 Answers1

0

Hope this helps.

For all languages you can try Count number of characters present in foreign language

function countChars(text){
  return text.split("").filter( function(char){ 
let charCode = char.charCodeAt(); return charCode >= 2309 && charCode <=2361 || charCode == 32;
  }).length;
}

let chars = "हा य";
console.log(countChars(chars));
Ashik Paul
  • 486
  • 4
  • 20