-3

I have made a regex, but I can't make it to match in order.

var $result = [];
var url_check = "CentOS-7-x86_64-LiveGNOME-1804";
var torrent_forbidden = ["CentOS-7 live", "Centos 7 livegnome", "Cent-7", "OS Cent-7", "centos:7", "centos word:7", "centos:6", "cento 7 s"];
jQuery.each(torrent_forbidden , function(index, torrent_forbidden) { 
    var regex = new RegExp('^(?=.*?' + torrent_forbidden.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').split(/\\?[\s,_.:*-]+/).join(')(?=.*?') + ')', 'gi');
    if(regex.test(url_check) === true){
        $result.push(torrent_forbidden + ' : true');
    }else{
        $result.push(torrent_forbidden + ' : false');
    }
});
console.log($result);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Here is what I expect to obtain as result in the string CentOS-7-x86_64-LiveGNOME-1804:

|-----------------------------------------|
| Search             | Result | Expected  |
|-----------------------------------------|
| CentOS-7 live      | true   | false     |
| Centos 7 livegnome | true   | true      |
| Cent-7             | true   | false     |
| OS Cent-7          | true   | false     |
| centos:7           | true   | true      |
| centos word:7      | false  | false     |
| centos:6           | true   | false     |
| cento 7 s          | true   | false     |
|-----------------------------------------|
executable
  • 3,365
  • 6
  • 24
  • 52

2 Answers2

1

You're creating regexes from strings like ^(?=.*?CentOS)(?=.*?7)(?=.*?live) which searches for given words. It lacks some restrictions, like:

  • this word can be followed only by - (or some other separator) or end of string
  • this word should be at beginning of the string or after - (or some other separator)

So, you need to create lookahead like:

(?=^(.*separators)?someword(separators|$))  

instead of:

(?=.*?someword)

(For - as separator it would be: (?=^(.*[-])?someword([-]|$)))

var $result = [];
var url_check = "CentOS-7-x86_64-LiveGNOME-1804";
var torrent_forbidden = ["CentOS-7 live", "Centos 7 livegnome", "Cent-7", "OS Cent-7", "centos:7", "centos word:7", "centos:6", "cento 7 s", "entOS-7", "*centos*"];
jQuery.each(torrent_forbidden , function(index, torrent_forbidden) { 
   var regexstr = '^(?=^(.*[-])?' + torrent_forbidden.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').split(/\\?[\s,_.:*-]+/).join('([-]|$))(?=^(.*[-])?') + '([-]|$))';
   console.log(regexstr)
    var regex = new RegExp(regexstr, 'gi');
    if(regex.test(url_check) === true){
        $result.push(torrent_forbidden + ' : true');
    }else{
        $result.push(torrent_forbidden + ' : false');
    }
});
console.log($result);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
barbsan
  • 3,418
  • 11
  • 21
  • 28
  • What's the logic behind: `Centos 7 livegnome : true"`? – zer00ne Jan 03 '19 at 12:01
  • @zer00ne string "CentOS-7-x86_64-LiveGNOME-1804" contains words "Centos", "7" and "livegnome" (it's case insensitive) – barbsan Jan 03 '19 at 12:04
  • Little issue, if I search like `entOS-7` it will be true – executable Jan 03 '19 at 12:05
  • @executable then add condition that it has to start with `-` or start of string – barbsan Jan 03 '19 at 12:05
  • What is the meaning of `-` ? – executable Jan 03 '19 at 12:07
  • @executable it's character `-` that seems to be separator for words in your string (for what I understand you can match only words in array `url_check.split('-')` ) – barbsan Jan 03 '19 at 12:12
  • I use this to separate `split(/\\?[\s,_.:*-]+/)` – executable Jan 03 '19 at 12:13
  • @executable I added regex for start of word, hope you're capable of replacing `-` with list of separators you need – barbsan Jan 03 '19 at 12:22
  • Correct me if I'm wrong `var regex = new RegExp('^(?=^(.*[\s,_.:*-])?' + torrent_forbidden.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').split(/\\?[\s,_.:*-]+/).join('([\s,_.:*-]|$))(?=^(.*[\s,_.:*-])?') + '([\s,_.:*-]|$))', 'gi');` – executable Jan 03 '19 at 12:42
  • 1
    @executable seems fine – barbsan Jan 03 '19 at 12:55
  • I tested with `*centos*`, it seems like it doesn't split the `*`, I expect it to be true – executable Jan 03 '19 at 13:19
  • @executable `torrent_forbidden.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').split(/\\?[\s,_.:*-]+/)` results in ["", "centos", ""], you need to remove empty strings from that array before `join` – barbsan Jan 03 '19 at 13:24
  • Alright, I ade this `var regexstr = '^(?=^(.*[\s,_.:*-])?' + torrent_interdit.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').split(/\\?[\s,_.:*-]+/).filter(function(e){ return e.replace(/(\r\n|\n|\r)/gm,"")}).join('([\s,_.:*-]|$))(?=^(.*[\s,_.:*-])?') + '([\s,_.:*-]|$))'` It look like it's working – executable Jan 03 '19 at 13:29
  • 1
    @executable it's really worth to create string first, and then pass it to `RegExp` so that you could debug it and see what it contains – barbsan Jan 03 '19 at 13:31
  • You mean to decompose the regex ? – executable Jan 03 '19 at 13:33
  • I made some more test and it seems like it doesn't check in order. https://jsbin.com/qosafihabo/1/edit?js,console – executable Jan 04 '19 at 14:03
  • No, it searches for each word separately - as your regex did. Look, you didn't wrote any requirements, included some code you obtained in other question and keep writing "look I've found yet another case I wanted to cover". And that jsbin lacks even dependency. – barbsan Jan 04 '19 at 14:25
1

This is a solution, it might not be perfect but you should precise more your requirements it it fails in some cases. The regex will test if the words are in order and separated by the separators (if not at beginning or end). It will match entire words (x86 will be detected but not x8). There can be words between those specified. Some explanations:

  • positive lookaheads don't consume characters, so i don't think you can combine them to guarantee an order, il will only guarantee that they will be present afterwards
  • you should use double escaping \\ if you build a RegExp from a string
  • no need to use filter to do another replace (it was in the duplicate post) -> EDIT: restored because it has another function: removing empty strings
  • you can use ?: to make a group for an alternative that doesn't capture, or if you want to quantify a group
  • if you have other questions about the regex, just ask..

var $result = [];
var url_check = "CentOS-7-x86_64-LiveGNOME-1804";
var torrent_forbidden = ["CentOS-7 live", "Centos 7 livegnome", "Cent-7", "OS Cent-7", "centos:7", "centos word:7", "centos:6", "cento 7 s", "CentOS x86", "CentOS x8", "*CentOS*"];
jQuery.each(torrent_forbidden , function(index, torrent_forbidden) { 
    var regexstr = '(?:^|[\\s,_.:*-])' + torrent_forbidden
      .replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
      .split(/\\?[\s,_.:*-]+/)
      .filter( function(e){ return e.replace(/(\r\n|\n|\r)/gm,""); } )
      .join('(?:(?:[\\s,_.:*-][^\\s,_.:*-]+)+)?[\\s,_.:*-]') + '(?:[\\s,_.:*-]|$)';
    console.log(regexstr); //To debug your regexes
    var regex = new RegExp(regexstr, 'gi');
    if(regex.test(url_check) === true){
        $result.push(torrent_forbidden + ' : true');
    }else{
        $result.push(torrent_forbidden + ' : false');
    }
});
console.log($result);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
Kaddath
  • 5,933
  • 1
  • 9
  • 23
  • Thank you a lot for your answer ! I'll make some testing – executable Jan 04 '19 at 15:41
  • I found an issue, if my search value is like `*centos*`. I think it's because we don't remove the empty string and the `filter` function corrected that issue – executable Jan 04 '19 at 15:58
  • You're right, i restored the filter. I find strange that it should be a positive result because it has `*` at the beginning while the file has nothing before `CentOS`, but that would require changes to take that in account – Kaddath Jan 04 '19 at 16:15
  • I think it's because it didn't escape so it break the regular expression. So we can't detect in the order of the filename ? – executable Jan 07 '19 at 08:09