Issue
I need to check if each word of a string is spelled correctly by searching a mongoDB collection for each word.
- Doing a minimum amount of DB query
- First word of each sentence must be in upper case, but this word could be upper or lower case in the dictionary. So I need a case sensitive match for each word. Only the first word of each sentence should be case insensitive.
Sample string
This is a simple example. Example. This is another example.
Dictionary structure
Assume there is a dictionary collection like this
{ word: 'this' },
{ word: 'is' },
{ word: 'a' },
{ word: 'example' },
{ word: 'Name' }
In my case, there are 100.000 words in this dictionary. Of course names are stored in upper case, verbs are stored lower case and so on...
Expected result
The words simple
and another
should be recognized as 'misspelled' word as they are not existing in the DB.
An array with all existing words should be in this case: ['This', 'is', 'a', 'example']
. This
is upper case as it is the first word of a sentence; in DB it is stored as lower case this
.
My attempt so far (Updated)
const sentences = string.replace(/([.?!])\s*(?= [A-Z])/g, '$1|').split('|');
let search = [],
words = [],
existing,
missing;
sentences.forEach(sentence => {
const w = sentence.trim().replace(/[^a-zA-Z0-9äöüÄÖÜß ]/gi, '').split(' ');
w.forEach((word, index) => {
const regex = new RegExp(['^', word, '$'].join(''), index === 0 ? 'i' : '');
search.push(regex);
words.push(word);
});
});
existing = Dictionary.find({
word: { $in: search }
}).map(obj => obj.word);
missing = _.difference(words, existing);
Problem
- The insensitive matches don't work properly:
/^Example$/i
will give me a result. But inexisting
there will go the original lowercaseexample
, that meansExample
will go tomissing
-Array. So the case insensitive search is working as expected, but the result arrays have a missmatch. I don't know how to solve this. - Optimizing the code possible? As I'm using two
forEach
-loops and adifference
...