count of word occurrences in a list, case insensitive

Question

What is the most professional way to obtain a case insensitive count of the distinct words contained in an array using plain javascript? I have done the first attempt myself but does not feel much professional.

I would like the result to be a Map

Imo, .forEach on the array, lowercase each word and count it in a dictionary. — Rani Sharim, Oct 12 '21 at 20:21
Thank you Rani Sharim. I was actually wondering about the possible use the reduce() — , Oct 12 '21 at 20:24
You already asked this question [Count occurrences of distinct words in array using a Map [closed]](https://stackoverflow.com/questions/69543628/count-occurrences-of-distinct-words-in-array-using-a-map) If you didn't get the answers you were looking for you should edit the question not post yet another [duplicate](https://stackoverflow.com/questions/5667888/counting-the-occurrences-frequency-of-array-elements) — pilchard, Oct 12 '21 at 21:11
@pilchard the previous question was about enhancing my beginner coding attempt. And as suggested has been reposted in the suggested forum (code review). So a new question has been reformulated to suit this forum. Thank you for your kind concern and guidance. — , Oct 12 '21 at 21:15
Also all of these answers use `toLowerCase` but `localeCompare` with case insensitivity set is actually a better comparison, with `toUpperCase` coming in second and `toLowerCase` third. see [Upper vs Lower Case](https://stackoverflow.com/questions/234591/upper-vs-lower-case) and [https://stackoverflow.com/questions/26877514/is-it-better-to-compare-strings-using-tolowercase-or-touppercase-in-javascript](Is it better to compare strings using toLowerCase or toUpperCase in JavaScript?) — pilchard, Oct 12 '21 at 21:16
@pilchard Thank you. You mean it's better to use toUpperCase? — , Oct 12 '21 at 21:18

Spectric · Accepted Answer · 2021-10-12T20:41:25.050

2

You can use Array.reduce to store each word as a property and the occurrence of each as the value.

In the reducer function, check whether the letter (converted to lowercase) exists as a property. If not, set its value to 1. Otherwise, increment the property value.

const arr = ["a", "A", "b", "B"]

const result = arr.reduce((a,b) => {
  let c = b.toLowerCase();
  return a[c] = a[c] ? ++a[c] : 1, a;
}, {})

console.log(result)

_{As a one liner: const result = arr.reduce((a,b) => (c = b.toLowerCase(), a[c] = a[c] ? ++a[c] : 1, a), {})}

To convert it to a Map, you can use Object.entries (sugged by @Théophile):

const arr = ["a", "A", "b", "B"]

const result = arr.reduce((a, b) => {
  let c = b.toLowerCase();
  return a[c] = a[c] ? ++a[c] : 1, a;
}, {})

const m = new Map(Object.entries(result))
m.forEach((value, key) => console.log(key, ':', value))

edited Oct 12 '21 at 20:41

answered Oct 12 '21 at 20:24

Spectric

30,714
6
20
43

1

thank you! very nice. Can I get explicitly a map as an output (distinct values, count)? – Oct 12 '21 at 20:26
Optimally, I would like to get as output a Map containing as keys the distinct words and as values the respective frequency (count) – Oct 12 '21 at 20:28
2

@Pam If `result` is an object, try `const m = new Map(Object.entries(result))` – Théophile Oct 12 '21 at 20:32
@Franjo Pintarić thank you for saying so. I am coming from other languages, and frankly, this use of "online functions" does feel a bit esoteric. I was wondering actually if it also feels professional to the people working in the field :-) – Oct 12 '21 at 20:32

score 0 · Answer 2 · answered Oct 12 '21 at 20:45

0

You can use an object to store the results and then create a Map object by passing that object to Object.entries

const arr = ["c", "A", "C", "B", "b"];

const counts = {};
for (const el of arr) {
  let c = el.toLowerCase();
  counts[c] = counts[c] ? ++counts[c] : 1;
}

console.log(counts);

const map = new Map(Object.entries(counts))
map.forEach((k,v) => console.log(k,v))

answered Oct 12 '21 at 20:45

Ran Turner

14,906
5
47
53

May quickly ask why ++ precedes counts[c] and not follows it ? – Oct 12 '21 at 20:53
1

Sure. Please read this thread:https://stackoverflow.com/questions/3469885/somevariable-vs-somevariable-in-javascript – Ran Turner Oct 12 '21 at 20:55
ah yes, I was aware of the ++"pre" and "post"++ increment. I was wondering why here we need to have a "pre". Would we count 1 less ? – Oct 12 '21 at 20:58
1

the result will be 1 for all keys because it will first evaluates the value and only then increment and store it – Ran Turner Oct 12 '21 at 21:01
1

If we do it with ++counts[c] way it will increments the value, then evaluates and store it, this is what we would like (: – Ran Turner Oct 12 '21 at 21:02

pilchard · Answer 3 · 2021-10-13T13:34:36.170

While the accepted 'group-by' operation is fine, it doesn't address the complexity of case-insensitive/unicode comparison.

First of all, you can reduce directly into a Map, here counting characters as they are without accounting for case-insensitivity or unicode variations resulting in 20 'distinct' characters being counted from an array of length 24.

const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
  'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];

const result = input.reduce((a, b) => a.set(b, (a.get(b) ?? 0) + 1), new Map());

console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

Based on the samples below, the method that results in the most compact count is using word.normalize().toLocaleUpperCase() and passing Turkey('tr') as a locale for this specific sample array. It results in 9 'distinct' characters being counted from an array of length 24, properly handling different encodings for ñ, equivalent spellings of Gesäß(GESÄSS), and accounting for locale specific case changes (i to İ)

const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
  'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];

const result_normalize_locale = input.reduce((a, b) => {
  const w = b.normalize().toLocaleUpperCase('tr');

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

Using this simple 'group-by' we can look at the variations between the available case comparison methods: toLowerCase(), toLocaleLowerCase(), toUpperCase(), and toLocaleUpperCase() and unicode variations can be accounted for using normalize()

To lower case

toLowerCase() – 15 'distinct' characters.

toLocaleLowerCase() – 14 'distinct' characters, in this case specifying Turkey('tr') as locale.

normalize().toLocaleLowerCase() – 12 'distinct' characters, again with 'tr' as locale.

const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
  'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
// ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ]
// input.length: 24

// grouping by toLowerCase()
const result = input.reduce((a, b) => {
  const w = b.toLowerCase();

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// grouping by toLocaleLowerCase('tr') [Turkey]
const result_locale = input.reduce((a, b) => {
  const w = b.toLocaleLowerCase('tr');

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// grouping by normalize().toLocaleLowerCase('tr') [Turkey]
const result_normalize_locale = input.reduce((a, b) => {
  const w = b.normalize().toLocaleLowerCase('tr');

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// log toLowerCase() result - 15 'distinct' characters
console.log('toLowerCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

// log toLocaleLowerCase('tr') result - 14 'distinct' characters
console.log("\ntoLocaleLowerCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

// log normalize().toLocaleLowerCase('tr') result - 12 'distinct' characters
console.log("\nnormalize().toLocaleLowerCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

.as-console-wrapper { max-height: 100% !important; top: 0; }

To upper case

toUpperCase() – 12 'distinct' characters.

toLocaleUpperCase() – 11 'distinct' characters, in this case specifying Turkey('tr') as locale.

normalize().toLocaleUpperCase() – 9 'distinct' characters, again with 'tr' as locale.

const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
  'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
// ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ]
// input.length: 24

// grouping by toUpperCase() 
const result = input.reduce((a, b) => {
  const w = b.toUpperCase();

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// grouping by toLocaleUpperCase('tr') [Turkey]
const result_locale = input.reduce((a, b) => {
  const w = b.toLocaleUpperCase('tr');

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// grouping by normalize().toLocaleUpperCase('tr') [Turkey]
const result_normalize_locale = input.reduce((a, b) => {
  const w = b.normalize().toLocaleUpperCase('tr');

  return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());

// log toUpperCase() result - 12 'distinct' characters
console.log('toUpperCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

// log toLocaleUpperCase('tr') result - 11 'distinct' characters
console.log("\ntoLocaleUpperCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

// log normalize().toLocaleUpperCase('tr') result - 9 'distinct' characters
console.log("\nnormalize().toLocaleUpperCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');

.as-console-wrapper { max-height: 100% !important; top: 0; }

Thanks a lot!, I just saw this. Let me try digest a bit this precious bulk of information, and I will get back to you. It's Very kind of you to provide all this insight :-) — , Oct 13 '21 at 14:08

score -1 · Answer 4 · answered Oct 12 '21 at 20:31

-1

use set to get rid of duplicates and the spread operator to put it back in an array.

const  myarray = ['one', 'One', 'two', 'TWO', 'three'];
const noDupes = [... new Set( myarray.map(x => x.toLowerCase()))];
console.log(noDupes);

answered Oct 12 '21 at 20:31

Bryan Dellinger

4,724
7
33
79

3

This produces an array of distinct entries, but not the count. – Théophile Oct 12 '21 at 20:34
1

@BryanDellinger That would return the length of the array, not the occurrence of each unique item. – Spectric Oct 12 '21 at 20:37
@Bryan Dellinger Thank you for answering anyway: at least I have learned a new constructor for Set :-) – Oct 12 '21 at 20:56

count of word occurrences in a list, case insensitive

4 Answers4