0

I am writing a migration utility, and a part of one of the conversions is testing a pair of regular expressions. For example, some of the tests will be:

+-------+-------+-------+
| Left  | Right | Match |
+-------+-------+-------+
| (.)01 | 101   | Yes   |
+-------+-------+-------+
| (.)02 | 101   | No    |
+-------+-------+-------+
| 101   | 101   | Yes   |
+-------+-------+-------+
| 201   | (.)01 | Yes   |
+-------+-------+-------+
| (.)01 | 2(.)1 | Yes   |
+-------+-------+-------+

At the moment, my test tests each by one using .matches, which works when there is only one side is a regular expression, but when both sides have one (i.e. the last row of the example), it returns false when it should return true.

So, how can I get a positive comparison result for the last example?

topherg
  • 4,203
  • 4
  • 37
  • 72
  • How come the last one returning `Yes` it should be `No` – Sanjeev Jul 01 '14 at 11:56
  • Because the first character of the `Left` is wild, and the second character of the `Right` is wild – topherg Jul 01 '14 at 11:57
  • but the last character does not match – Sanjeev Jul 01 '14 at 11:57
  • Now it makes sense :) – Sanjeev Jul 01 '14 at 11:58
  • Is this example oversimplified or only regular expression mechanism used here is really `(.)`? – Pshemo Jul 01 '14 at 12:01
  • 3
    Why would `(.)01` match `2(.)1`? For example `(.)01` matches `901` and `2(.)1` matches `281` but `901` and `281` are very different. Please explain in more detail what you are trying to achieve and also what you have tried so far. – AdrianHHH Jul 01 '14 at 12:02
  • @Pshemo it is simplified, but its either `(.)` or `(.*)` – topherg Jul 01 '14 at 12:02
  • @AdrianHHH It is the nature of the software I work with, its all about internal identification methods of functions. The first, second, and third characters all represent separate features of what is done – topherg Jul 01 '14 at 12:03
  • Can `(.)` or `(.*)` appear only once in regex or are few of them possible like in case of `1(.)2(.)3(.*)4`? – Pshemo Jul 01 '14 at 12:04
  • @Pshemo potentially if the core of the software changes, but for the moment I am working on the assumption it will not. If it does, i'll come back and change it, but not now – topherg Jul 01 '14 at 12:06
  • Are there some characters that are not allowed in input for instance alphabetic or `?` `!` ? – Pshemo Jul 01 '14 at 12:09
  • @Pshemo the majority of the rules are numeric, but there are some special functions that have letters, but those only go as high as `H`, but no special characters – topherg Jul 01 '14 at 12:11
  • There is no exact description of what you want to do, except some random examples. You should first specify in words what you want to do. Is the last column the answer to the question "Is it possible to construct one string which will match both the left and the right regexes?". If so, you should clarify that. – Alderath Jul 01 '14 at 12:27
  • Using only decimal digits, the the first line left `(.)01` gives the set {001, 101, 201, ..., 901} and the right value `101` is a member of that set. Similarly the forth line right has `(.)01` which gives the same set and the left value 201 is a member. The last line gives two sets: `(.)01` gives {001, 101, 201, ..., 901} and `2(.)1` gives {201, 211, 221, ..., 291}. Are you asking whether the intersection of the two sets is not empty? – AdrianHHH Jul 01 '14 at 12:28

2 Answers2

2

You're using regular expressions wrong. You should use regexes to match Strings with a given regex, not to test regexes against each other.

A regex represents a set of possible matches: (.)01 matches a01, 301, $01, etc, etc...

So, doing this makes sense when you match one item from that set, eg.$01 back against the regex.

In your last case, you're attempting to match a regex with a regex, which is just silly. Which regex is your source and which String is your target? If the first regex is your source, 201 matches it, but also 101, #01, etc... But this is not right according to the second regex, which matches items like 201, but also 2#1 and 291. So they should not be considered 'matching each other'.

Take a look at this Venn Diagram:

enter image description here

Your last regex match-up has two regexes fighting each other. The first regex is represented by circle A. The second regex is represented by circle B.

There are elements (well, just 201) which are/is both in circle A and circle B (pointed out by the darker colored both A & B). Would you consider these circles to be matching? I certainly don't. I would if they covered each other exactly.

But the only way for these circles to cover each other exactly (meaning everything in circle A is in circle B and everything in circle B is in circle A), is if both regexes are completely the same! Like (.)01 and...... (.)01! This is the only possible match, but if you're treating one like a regex and one like a String, it still won't work.

EDIT If you just want to find whether this is at least one common match, this can be helpful: https://stackoverflow.com/a/17957180/1283166

Community
  • 1
  • 1
Davio
  • 4,609
  • 2
  • 31
  • 58
  • OP doesn't want to check if `A` == `B` but to check if intersection of `A` and `B` is not empty (if there exist some common matches). – Pshemo Jul 01 '14 at 12:36
0

You do not need any regular expression comparison in this scenario. A simple algorithm will work.

Remove characters based on wild card from both the strings and then use equals() method to check.

Something like this might help you if there is no more than one occurrence of wild card in each string:

    final String WILD_CARD = "(.)";
    String str1 ="(.)01";
    String str2 ="2(.)1";
    int index = -1;
    if((index=str1.indexOf(WILD_CARD))!=-1) {
        str1 = str1.replace(WILD_CARD, "");
        str2 = str2.replace(String.valueOf(str2.charAt(index)),"");
    }

    if((index=str2.indexOf(WILD_CARD))!=-1) {
        str2 = str2.replace(WILD_CARD, "");
        str1 = str1.replace(String.valueOf(str1.charAt(index)), "");
    } 

    if(str1.equals(str2)) {
        System.out.println("Yes");
    } else {
        System.out.println("No");
    }
Sanjeev
  • 9,876
  • 2
  • 22
  • 33
  • What if both sides have `(.*)`? – Pshemo Jul 01 '14 at 12:21
  • I could have write full implementation but i purposefully left it for the OP to implement. – Sanjeev Jul 01 '14 at 12:23
  • This solution is incorrect. If str1 is `1(*)1` and str2 is `1(*)1` then the first if block would modify them to str1 = `11` str2 = `1*)1`. Second if statement would do nothing and the last one would conclude they're not equal. There are also errors if there are multiple `(.)` in either string. Or if str1 = `(.)11` str2 = `111` as the str2.replace would replace all the ones in str2 with empty string. – Alderath Jul 01 '14 at 12:38
  • @Alderath I agree with you. I already said this is not a full implementation. This might be a way to implement what OP needs. – Sanjeev Jul 01 '14 at 12:43