0

I want to retrieve inside an array all the elements who match multiple strings (all of them & not necessary words): like a search engine returning all results matching term_searched#1 && term_searched#2.

It's not a question about duplicates in the array (there's none), but about searching for a conjunction of elements: traditionally, the search is for one element, by himself or in disjunction with others (a|b|c). Just want to search (a && b && c).

I tried:

  • indexOf() : I can work only with one element to locate in the array.
  • match() : there is no AND operator in a regex expression (only | - sadly, it would be so simple). So I tried to inject these regex expressions
    • /(?=element1).*(?=element2)/gim
    • /(?=element1)(?=element2)/gim see here

The first regex expression works, but not at every time: seems very fragile...

So I don't know if I'm in the good direction (match) or if I can't figure what is the right regex expression... Need your advices.

// filter grid by searching on 'input' event
'input #search': (e)=> {
    var keypressed = e.currentTarget.value;

    // create array on 'space' input
    var keyarr = keypressed.toLowerCase().split(" ");

    // format each array's element into regex expression
    var keyarrReg = [];
    for(i = 0; i < keyarr.length; i++) {
        var reg = '(?=' + keyarr[i] + ')';
        keyarrReg.push(reg);
    }

    // array to regex string into '/(?=element1).*(?=element2)/gim' format
    var searching = new RegExp(keyarrReg.join(".*"), 'mgi');

    // set grid
    var grid = new Muuri('#gridre', {
        layout: {
            fillGaps: true,
        }
    });

    if (keypressed) {
        // filter all grid's items (grid of items is an array)
        grid.filter(function (item) {
            var searchoperator = item.getElement().textContent.toLowerCase().match(searching);
            // get items + only their text + lower case their text + return true (not false) in the value ('keypressed') is found in them
            //var searchoperator = item.getElement().textContent.toLowerCase().indexOf(keypressed.toLowerCase()) != -1;
            return searchoperator;
        }
        [....]

    }
}

Edit with Gawil's answer adapted to my initial code (to help if needed)

// filter grid by searching on 'input' event
'input #search': (e)=> {
    var keypressed = e.currentTarget.value;

    // create array on 'space' input
    var keyarr = keypressed.toLowerCase().split(" ");

    // convert the array to a regex string, in a '^(?=.*word1)(?=.*word2).*$' format
    // here is Gawil's answer, formatted by Teemu 
    var searching = new RegExp('^(?=.*' + keyarr.join(')(?=.*') + ').*$', 'm');

    // set grid
    var grid = new Muuri('#gridre', {
        layout: {
            fillGaps: true,
        }
    });

    if (keypressed) {
        // filter all grid's items (grid of items is an array)
        grid.filter(function (item) {
            // get items + only their text + lower case their text + delete space between paragraphs
            var searchraw = item.getElement().textContent.toLowerCase().replace(/\r\n|\n|\r/gm,' ');
            var searchoperator = searchraw.match(searching);
            return searchoperator;
        }
        [....]

    }
}
Ontokrat
  • 189
  • 1
  • 14
  • Possible duplicate of [Remove Duplicates from JavaScript Array](https://stackoverflow.com/questions/9229645/remove-duplicates-from-javascript-array) – lumio Aug 09 '17 at 12:20
  • @lumio what is the link with removing duplicates? – Ontokrat Aug 09 '17 at 12:24
  • You want to get duplicates of an array. Instead of removing you just collect them. – lumio Aug 09 '17 at 12:24
  • @lumio could you be more specific? – Ontokrat Aug 09 '17 at 12:29
  • You can use `match` and a `|` separated whitelist with `g` flag. Then remove the duplicates from the matches array, and compare its length to the count of keywords. If the counts equals, all the keywords have been found. – Teemu Aug 09 '17 at 13:34
  • @Teemu Sorry, but what do you mean by 'remove the duplicates from the matches array'? Also, 3 elements of the array can each match 2 given keywords - no? Why looking for a number equality between matched elements and keywords? But maybe I don't understand you. – Ontokrat Aug 09 '17 at 13:47
  • [See the fiddle](https://jsfiddle.net/j004qmad/). Yes, or as much keywords as was given. The number equality needs to be checked, so we can be sure all the keywords were found, that makes the search "conjuctive". – Teemu Aug 09 '17 at 13:56
  • In other words, if you have an array like `{abcd, cdef, abef, fcab}`, and you want to match the strings `a`, `b`, and `c`, it should return `abcd` and `fcab`, did I get it right ? – Gawil Aug 09 '17 at 14:02
  • 1
    Whoops, `temp` definition in the dup checking was slipped to a wrong function, [fixed](https://jsfiddle.net/j004qmad/1/). – Teemu Aug 09 '17 at 14:09
  • @Teemu works so great! Thanks a lot. Would you please make your fiddle as an anwer? Maybe with a little more explanations (it's far beyond my actual knowledge of js). – Ontokrat Aug 09 '17 at 14:16
  • OK, I'll post it as an answer, it'll just takes a while. – Teemu Aug 09 '17 at 14:19
  • @Gawil That's exactly the need. – Ontokrat Aug 09 '17 at 14:24
  • Ok, I was going to post an answer similar to another one I posted today : https://stackoverflow.com/a/45592450/7963408. Basically, it is like finding words containing specific letters, except it's words in place of letters. However, Teemu seems to have found a solution, so I'll let him post it :) – Gawil Aug 09 '17 at 14:31
  • @Teemu Just saw your solution, good work. I'll still post my own solution, just for the record, in case some people want a solution more regex-oriented ^^ – Gawil Aug 09 '17 at 14:34

2 Answers2

3

The code bellow will log each element of the array containing words cats and dogs.
It uses the regex ^(?=.*word1)(?=.*word2).*$
To handle new lines, use this one instead :
^(?=(?:.|\n)*word1)(?=(?:.|\n)*word2).*$

You can add as many words as you want following the same logic, and it does not take order of the words in count.

It is very similar to what you tried, except that you have to do all (?=) checks before matching the string. Indeed, your first regex works only when the words are in the right order (element1 and then element2). Your second regex almost works, but you wrote only lookaheads, so it checks the presence of each word, but won't match anything.

var words = ["cats", "dog"]
var array = [
  "this is a string",
  "a string with the word cats",
  "a string with the word dogs",
  "a string with both words cats and dogs",
  "cats rule everything",
  "dogs rule cats",
  "this line is for dog\nbut cats prefer this one"
]

var regexString = "^";
words.forEach(function(word) { regexString += ("(?=(?:.|\n)*"+word+")"); });

var regex = new RegExp(regexString);

array.forEach(function(str) { // Loop through the array
  if(str.match(regex)) {
    console.log(str); // Display if words have been found
  }
});
  
Gawil
  • 1,171
  • 6
  • 13
  • Thanks for this answer. To be precise, I want to find not only plain words. Like 'cats' && 'dog'. – Ontokrat Aug 09 '17 at 15:04
  • @Ontokrat What do you mean ? The regex works with anything, numbers, special characters, etc... – Gawil Aug 09 '17 at 15:07
  • I mean not only 'whole' words: I need a match on 'dogs', as on 'dog', as on 'do', etc. – Ontokrat Aug 09 '17 at 15:12
  • @Ontokrat You want it to return strings containg `do` when you search for `dogs` ? That seems weird, but if that's the case, I can adapt it ^^ Edit : It doesn't seem to be the case in Teemu's answer... When I search for `also`, it doesn't return the first string which contains `a`... I really don't understand ^^ If you mean it needs to match `dogs` when you search for `do`, then it's already the case ! – Gawil Aug 09 '17 at 15:15
  • Ok, for 'do' it's weird. But for 'dog', it makes sense that a user who is searching for 'dog' gets a match regardless the plural : 'dog' & 'dogs' – Ontokrat Aug 09 '17 at 15:18
  • @Ontokrat It is the case. If you search for `dog`, it will return strings containing `dogs`. However if you search for `dogs`, it won't return strings containing `dog`. That would be impossible to do with regex, since it's dependent on the language. For the same reasons, if you search for `wolf`, it can't possibly find `wolves` by himself. You'd need to add a dictionary to your code for that. – Gawil Aug 09 '17 at 15:20
  • We do understand each other: I just needed that a part match the whole (not the inverse of course), so `dog` can match `dog` and `dogs` - as `y` match `yes`. It is the case, so perfect. – Ontokrat Aug 09 '17 at 15:48
  • @Ontokrat I can't actually see any difference in the functionality between the codes. Only Gawil has much less code, and the RegExp is a bit harder to construct automatically, which is not a big deal. I applied [Gawil's RegExp to a fiddle](https://jsfiddle.net/j004qmad/5/) too, you can play with it as well. For some reason I totally forgot the lookaheads ... This is definitely a better answer = ). – Teemu Aug 09 '17 at 15:48
  • @Teemu I would not say my answer is better, just shorter ^^ Your answer might be faster, regex is known to be slow with most flavours. You're right about the regexp being harder to construct automatically, I'll update my answer to provide an example of how to do it, just in case someone would want to use it ^^ – Gawil Aug 09 '17 at 15:51
  • @Gawil I'm not sure, if it is even faster, though the regExp itself might be slightly faster, the dup-removing eats the advantage. But isn't it so, that the amount of lookaheads is limited to 10? In a case there would be more than 10 keywords, my code would still work correctly ..? – Teemu Aug 09 '17 at 15:55
  • @Teemu I've never heard about lookahead being limited in number, and I can't find anything about that... Are you sure about this ? Where do you get this from ? It would be something really important to know, and I hope I did not miss that ^^ – Gawil Aug 09 '17 at 15:58
  • The greatest captured `$` property is `9`, that's why I was thinking of a limitation, but it's only a guess. Just tested with the fiddle, no limits, at least 15 keywords was easily found. – Teemu Aug 09 '17 at 16:04
  • @Teemu Oh don't worry, you can definitely use more than 10 capturing groups ! Also, lookaheads are not capturing groups, so it wouldn't have been a problem anyway ^^ – Gawil Aug 09 '17 at 16:10
  • @Gawil Just a last issue: the elements of my array are multi-lines (each element has many paragraphs): without a `m`flag the code can't work (found after 1 hour of confusion). But after, I found that a search of 2 keywords inside a same paragraph is fine, but not if each of two keywords belongs to a different paragraph (regardless the order of the paragraphs found). Hope I'm clear.... – Ontokrat Aug 09 '17 at 17:12
  • @Gawil a `replace(/\r\n|\n|\r/gm,' ')` on each element of the array made the trick – Ontokrat Aug 09 '17 at 17:40
  • @Ontokrat Yeah, it is because `.` does not match new lines. You have to use the `s` flag for that, but this flag does not exist in javascript (as far as I know). However you can handle it by replacing `.*` inside the lookahead by `(?:.|\n)*`, which mean "any character or a new line". I updated my answer. – Gawil Aug 10 '17 at 07:31
  • @Gawil Duly noted. Thanks for taking so much time for this question! – Ontokrat Aug 10 '17 at 10:22
1

If I've correctly understood your question, you've an array of strings, and some keywords, which have to be found from every index in the array to be accepted in the search results.

You can use a "whitelist", i.e. a regExp where the keywords are separated with |. Then iterate through the array, and on every member create an array of matches against the whitelist. Remove the duplicates from the matches array, and check, that all the keywords are in the list simply by comparing the length of the matches array to the count of the keywords. Like so:

function searchAll (arr, keywords) {
    var txt = keywords.split(' '),
    len = txt.length,
    regex = new RegExp(txt.join('|'), 'gi'), // A pipe separated whitelist
    hits; // The final results to return, an array containing the contents of the matched members
    // Create an array of the rows matching all the keywords
    hits = arr.filter(function (row) {
        var res = row.match(regex), // An array of matched keywords
           final, temp;
        if (!res) {return false;}
        // Remove the dups from the matches array
        temp = {}; // Temporary store for the found keywords
        final = res.filter(function (match) {
         if (!temp[match]) {
                // Add the found keyword to store, and accept the keyword to the final array
               return temp[match] = true;
            }
            return false;
        });
        // Return matches count compared to keywords count to make sure all the keywords were found
        return final.length === len;
    });
    return hits;
}

var txt = "Some text including a couple of numbers like 8 and 9. More text to retrieve, also containing some numbers 7, 8, 8, 8 and 9",
  arr = txt.split('.'),
  searchBut = document.getElementById('search');
  
searchBut.addEventListener('change', function (e) {
  var hits = searchAll(arr, e.target.value);
  console.log(hits);
});
<input id="search">

The advantage of the whitelist is, that you don't have to know the exact order of the keywords in the text, and the text can contain any characters.

Teemu
  • 22,918
  • 7
  • 53
  • 106
  • for now the complete snippet can be found in the question's 9th comment. – Ontokrat Aug 09 '17 at 17:45
  • @Ontokrat It appeared, that my snippet doesn't work on some conditions, as Gawil's code still gives correct results. I'd suggest you to accept Gawil's answer instead, it is more robust. [In this fiddle](https://jsfiddle.net/j004qmad/6/) you can play with the both of the snippets, I've slightly changed Gawil's code to better fit with the other code in the fiddle, but the idea is still the original. – Teemu Aug 09 '17 at 17:51
  • I will if you are ok with that - both codes worked fine for me so far. Anyway, you totally deserve a special mention: you wrote a lot of fine & concise code, adapting mine & Gawil's. No need a `for` loop anymore, thanks to you. – Ontokrat Aug 09 '17 at 18:00
  • Yes, it is OK, especially when my answer didn't meet all the requirements. And also, this is your post, you can accept whatever answer you want. The loop is still there, hidden in `Array.filter` method. But the snippet has a simple and compact dup removing method which works pretty fluently, and it shows some use cases for `Array.filter` as well. – Teemu Aug 09 '17 at 18:06