1

I am currently trying to extract sentences from a list that doesn't contain any of the words from a word list.

The lists are with both letters and numbers, upper case and lower case.

I have managed to extract the words that the list of sentences contains but for some reason I can't get it to extract the sentences from the sentences list that doesn't contain any of the words from the word list.

Here is some Pseaudo code of the input contra the expected output to visualize it:

//input
var list1 = ["sentence with word1", "sentence with word2", "sentence without 3"];
var list2 = ["word1", "word2", "word3"];


//To fill out
var list1ContainedWords = [];
var list1DidntContainWords = [];

var extract = function (list1, list2) {

}

//Expected output
list1ContainedWords = ["word1", "word2"];
list1DidntContainWords = ["sentence without 3"];

2 Answers2

2

Generate a regex using the second array and check for pattern match using RegExp#test method.

var extract = function(list1, list2) {
  // object for storing the result, do it as you want
  var res = {
    contains : [],
    notContains : []
  };

  // generate regex using the second list strings
  // in a way which matches any of the string
  var regex = new RegExp(list2.map(function(v) {
    // escape any of the symbol which has special meaning in regex
    // although use word boundary in case you want exact match
    // word boundary can be either use here or wrap by a gruop and use it commonly
    return '\\b' + v.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') + '\\b';
    // join them using pipe symbol(or)
  }).join('|'));

  // or alternate with word boundary with a group
  // '\\b(' + list2.map(....).join('|') + ')\\b'
  // or with non-capturing group
  // '\\b(?:' + list2.map(....).join('|') + ')\\b'

  // iterate over the first list
  list1.forEach(function(v) {
    // check pattern is matching, if matching push into contains property
    if (regex.test(v))
      res.contains.push(v);
    // if not push into notContains property
    else
      res.notContains.push(v);
  })

  // return the result object
  return res;
}

//input
var list1 = ["sentence with word1", "sentence with word2", "sentence without 3"];
var list2 = ["word1", "word2", "word3"];


//To fill out
var list1ContainedWords = [];
var list1DidntContainWords = [];

var extract = function(list1, list2) {
  var res = {
    contains: [],
    notContains: []
  };
  var regex = new RegExp(list2.map(function(v) {
    return v.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&');
  }).join('|'));
  list1.forEach(function(v) {
    if (regex.test(v))
      res.contains.push(v);
    else
      res.notContains.push(v);
  })
  return res;
}

console.log(extract(list1, list2));

Refer : Converting user input string to regular expression
Community
  • 1
  • 1
Pranav C Balan
  • 113,687
  • 23
  • 165
  • 188
0

Something like this might be a good solution:

//input
var list1 = ["sentence with word1", "sentence with word2", "sentence without 3"];
var list2 = ["word1", "word2"];


//To fill out
var list1ContainedWords = [];
var list1DidntContainWords = [];

var extract = function (list1, list2) {
  list1.forEach(function(item) {
    var found = false;
    list2.forEach(function(item2) {
      if (item.indexOf(item2) > -1) {
        if (list1ContainedWords.indexOf(item2) === -1) {
          list1ContainedWords.push(item2);
        }
        found = true;
      }
    }) 

    if (!found) {
      list1DidntContainWords.push(item)
    }
  })
}

extract(list1, list2);
console.log(list1ContainedWords, list1DidntContainWords);

Essentially, looping through both arrays and just doing a check to see if the word is in the sentence and keeping track of it.

Kody
  • 1,319
  • 1
  • 9
  • 12
  • 1
    This has a very high Time Complexity ~ `O(n2)`. Checking with Regex will finish it ~ `O(n)` – AdityaReddy Jan 17 '17 at 17:22
  • 1
    @AdityaReddy yes after seeing the regex solution I would agree. :) – Kody Jan 17 '17 at 17:24
  • @AdityaReddy So you assume the regex test is an O(1) operation. I am pretty sure it's pretty slow especially if you have conditionals inside the regex like |. – Redu Jan 18 '17 at 15:17
  • @AdityaReddy & redu I threw together a quick jsperf https://jsperf.com/foreach-vs-regex - Interesting results in terms of actual speed here. JSPerf is not perfect but it does look like my solution is faster. It is also worth noting, that the solution by Aditya has gotten more complex since my first comment. – Kody Jan 18 '17 at 15:46