5

Given a set of words, I need to put them in an hash keyed on the first letter of the word. I have words = {}, with keys A..Z and 0 for numbers and symbols. I was doing something like

var firstLetter = name.charAt(0);
    firstLetter = firstLetter.toUpperCase();

if (firstLetter < "A" || firstLetter > "Z") {
    firstLetter = "0";
}
if (words[firstLetter] === undefined) {
    words[firstLetter] = [];
} 
words[firstLetter].push(name);

but this fails with dieresis and other chars, like in the word Ärzteversorgung. That word is put in the "0" array, how could I put it in the "A" array?

ilanco
  • 9,581
  • 4
  • 32
  • 37
cdarwin
  • 4,141
  • 9
  • 42
  • 66
  • Do you only want to have characters like Ä detected as letters, or do you want to have Ä detected as if it were an A? – Bergi May 22 '12 at 18:37
  • Ä is not an A. You will need a mapping of characters with accents to without accents. – Starkey May 22 '12 at 18:37
  • you have to map for Ä and other alphabets of this character too. like you are doing for 0 1 2 3..... and abc etc. – Rizstien May 22 '12 at 18:41
  • Check out this post http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript – Heitor Chang May 22 '12 at 18:42
  • Would using a regex test, like `/[\w\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]/.test(firstLetter)` be appropriate? – Chris Carew May 22 '12 at 18:46
  • @HeitorChang this duplicate have better answers http://stackoverflow.com/questions/990904/javascript-remove-accents-in-strings – ajax333221 May 22 '12 at 19:02
  • I need to map Ä as A. The link suggested by ajax333221 has answers that solve to my question, as you have to transform the string using some kind of map – cdarwin May 22 '12 at 20:13
  • you may want to remove accents & then do a simple [a-z] check. see http://stackoverflow.com/questions/990904/javascript-remove-accents-in-strings – Adriano Sep 15 '14 at 13:35

5 Answers5

16

You can use this to test if a character is likely to be a letter:

var firstLetter = name.charAt(0).toUpperCase();
if( firstLetter.toLowerCase() != firstLetter) {
    // it's a letter
}
else {
    // it's a symbol
}

This works because JavaScript already has a mapping for lowercase to uppercase letters (and vice versa), so if a character is unchanged by toLowerCase() then it's not in the letter table.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • 1
    An interesting trick, but as you emphasize, it works just “likely”. But it might be used if you add ad hoc checks for all the characters that may appear and that cause a wrong result in the simple test. In the Latin 1 range, the following characters get misclassified: º and ª (masculine and feminine ordinal indicators), sharp ß (probably the most relevant character here), and debatably the micro sign µ (formally a letter in Unicode, compatibility equivalent to Greek letter mu, but widely understood as a special character rather than a letter). – Jukka K. Korpela May 22 '12 at 19:07
  • 4
    It only works for characters in bicameral scripts, i.e. writing systems that make uppercase/lowercase distinction; most scripts don’t (e.g., Hebrew, Devanagari, Chinese). – Jukka K. Korpela Nov 03 '12 at 14:02
  • @JukkaK.Korpela - Yes there are cons and the pros are the speed of the check. this can processed much faster than anything else, and English will likely (should) be the only language most people will need) – vsync Jan 13 '14 at 00:44
  • 3
    @vsync, the question mentions the sample word “Ärzteversorgung”. It isn’t English. Typically if people only think of English, they don’t even ask this question—they just assume that `[A-Za-z]` covers all letters. – Jukka K. Korpela Jan 13 '14 at 09:46
  • Questions in Stackoverflow sometimes means nothing because I came here from Google and what the OP asked for is not what the title suggests, therefore it's important to cover the answers for the people who do come here from Google for answers. – vsync Jan 13 '14 at 13:40
  • As of 2020: In a latin script you can use the uppercase/lowercase comparison to mark a character as a letter. Try this: 'ß'.toUpperCase() // ==> 'SS', 'ſ'.toUpperCase() // ==> 'S'. Even in Greek: 'µ'.toUpperCase() // ==> 'Μ' (for \u00B5 as well as \u03BC). Only for the ordinal indicators the lowercase === uppercase. – Onno van der Zee Apr 13 '20 at 22:29
5

Try converting the character to its uppercase and lowercase and check to see if there's a difference. Only letter characters change when they are converted to their respective upper and lower case (numbers, punctuation marks, etc. don't). Below is a sample function using this concept in mind:

function isALetter(charVal)
{
    if( charVal.toUpperCase() != charVal.toLowerCase() )
       return true;
    else
       return false;
}
JDE
  • 59
  • 1
  • 2
4

You could use a regular expression. Unfortunately, JavaScript does not consider international characters to be "word characters". But you can do it with the regular expression below:

var firstLetter = name.charAt(0);
firstLetter = firstLetter.toUpperCase();
if (!firstLetter.match(/^\wÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇߨøÅ寿ÞþÐð$/)) {
    firstLetter = "0";
}
if (words[firstLetter] === undefined) {
    words[firstLetter] = [];
} 
words[firstLetter].push(name);
jnrbsn
  • 2,498
  • 1
  • 18
  • 25
2

You can use .charCodeAt(0); to get the position in the ASCII Chart and then do some checks.

The ranges you are looking for are probably 65-90, 97-122, 128-154, 160-165 (inclusive), but double check this by viewing the ASCII Chart

Something like this

if((x>64&&x<91)||(x>96&&x<123)||(x>127&&x<155)||(x>159&&x<166))

Where x is the Char Code

ajax333221
  • 11,436
  • 16
  • 61
  • 95
2

This is fortunately now possible without external libraries. Straight from the docs:

let story = "It’s the Cheshire Cat: now I shall have somebody to talk to.";

// Most explicit form
story.match(/\p{General_Category=Letter}/gu);

// It is not mandatory to use the property name for General categories
story.match(/\p{Letter}/gu);
adam.baker
  • 1,447
  • 1
  • 14
  • 30