0

i have two sentences and i would like to find all the words they share regardless of capitalization or punctuation. currently this is what I am doing:

    searchWords = sentence1.split(" ");
    var wordList = sentence2.split(" ");
    const matchList = wordList.filter(value => -1 !== searchWords.indexOf(value));

it works ok but obviously capitalization and punctuation cause issues. i know i need to incorporate something like .match() in there but i don't know how to work with it. I am sure this is something someone has done before just havent found the code yet, any refrences are also appreciated.

Thank you,

Best

This dude.

2 Answers2

1

If you're looking for any words that match you can use RegExp with String.prototype.replace and verify a match using String.prototype.search with a created RegExp and an i flag to allow case insensitivity.

function compare(str1, str2, matches = []) {
     str1.replace(/(\w+)/g, m => str2.search(new RegExp(m, "i")) >= 0 && matches.push(m));
     return matches;
 }
 
 console.log( compare("Hello there this is a test", "Hello Test this is a world") );

If you're looking for specific words that match you can use functional composition to split each string into an Array, filter each by possible matches, and then filter one against the other.

function compare(str1, str2, matchables) {
     let containFilter = (a) => (i) => a.includes(i),
     matchFilter = s => s.toLowerCase().split(" ").filter(containFilter(matchables));
     
    return matchFilter(str1).filter(containFilter( matchFilter(str2) ));
 }
 
 let matchables = ["hello", "test", "world"];
 console.log( compare("Hello there this is a test", "Hi Test this is a world", matchables) );
zfrisch
  • 8,474
  • 1
  • 22
  • 34
  • compare("Chapter 1 test.", "Chapter one test") first example doesn't work correctly for this, only matches test – Tadewos Bellete Aug 30 '19 at 21:53
  • @TadewosBellete Thanks for letting me know! I simply used a `>` operator instead of a `>=` on the index check. – zfrisch Aug 30 '19 at 22:26
  • Thanks dude! can i add an expressing in the new RegExp(m,'i') to make it ignore punctuation? – Tadewos Bellete Aug 30 '19 at 22:45
  • @TadewosBellete sure. There are a few ways to do it. You can strip the string of these punctuation characters before the search, OR you can adjust the RegEx manually. `\w` is equivalent to `[A-Za-z0-9_] ` - so if you want to remove the underscore as an option simply change it to `[A-Za-z0-9]+` - but this all depends on your use case. Does that help? – zfrisch Sep 01 '19 at 00:27
0

I think you may be over-thinking this. Would just converting both sentences to an array and using a for loop to cycle through the words work? For example:

var searchWords = sentence1.split(" ");
var wordList = sentence2.toLowerCase().split(" ");
var commonWords = [];
for(var i = 0; i < searchWords.length; i++){
    if(wordList.includes(searchWords[i].toLowerCase())){
        commonWords.push(searchWords[i])
    }
}
console.log(commonWords);

Or some variation of that.

As for the punctuation, you could probably add .replace(/[^A-Za-z0-9\s]/g,"") to the end of searchWords[i].toLowerCase() as mentioned in the following answer: https://stackoverflow.com/a/33408855/10601203

Jesse
  • 1,386
  • 3
  • 9
  • 23
  • yea haha maybe you 're right, working well but punctuation is not ignore. Might not need it to be tho. – Tadewos Bellete Aug 30 '19 at 21:59
  • I added an edit for the punctuation. You can add `.replace(/[^A-Za-z0-9\s]/g,"")` to the end of `searchWords[i].toLowerCase()` as mentioned in the following answer: https://stackoverflow.com/a/33408855/10601203 – Jesse Aug 30 '19 at 22:10