12

Regex experts please help to see if this problem can be solved by regex:

Given string 1 is any string

And string 2 is any string containing all parts of string 1 (but not a simple match -- I will give example)

How to use regex to replace all parts of string 1 in string 2 with blank so that what's remained is the string not in string 1?

For example: str1 = "test xyz"; str2 = "test ab xyz"

I want " ab" or "ab " back. What is the regex I can write so that when I run a replace function on str2, it will return " ab"?

Here is some non-regex code:

            function findStringDiff(str1, str2) {
                var compareString = function(str1, str2) {
                    var a1 = str1.split("");
                    var a2 = str2.split("");
                    var idx2 = 0;
                    a1.forEach(function(val) {
                        if (a2[idx2] === val) {
                          a2.splice(idx2,1);
                        } else {
                            idx2 += 1;
                        }
                    });
                    if (idx2 > 0) {
                        a2.splice(idx2,a2.length);
                    }
                    return a2.join("");
                }

                if (str1.length < str2.length) {
                    return compareString(str1, str2);
                } else {
                    return compareString(str2, str1);
                }
            }

            console.log(findStringDiff("test xyz","test ab xyz"));
techguy2000
  • 4,861
  • 6
  • 32
  • 48
  • 13
    I don't see how regular expressions would be helpful here at all. – Ja͢ck Apr 11 '15 at 03:21
  • 3
    Btw, the algorithm you have shown here would make it seem that there are no differences between `'$1.00'` and `'00.1$'`. – Ja͢ck Apr 11 '15 at 03:24
  • "Or easier and faster code?" --- is it complicated or slow? – zerkms Apr 11 '15 at 03:25
  • What is the "difference" of two strings? The characters present in one that are not present in the other at the same frequency? The result obtained by finding the longest common subsequence and removing it? Or another definition? – Millie Smith Apr 11 '15 at 03:32
  • 1
    The code above even thinks that "ab" and "cd" are the same. – Millie Smith Apr 11 '15 at 03:34
  • Very good point. The idea is to find that there is an extra dot in the second string. What is the best way to do this? Thanks! – techguy2000 Apr 11 '15 at 04:01
  • 1
    What is the expected output? – Mulan Apr 11 '15 at 04:14
  • In my example, I want the extra . – techguy2000 Apr 11 '15 at 04:22
  • The new code is no good either; it will modify a2, so the indexes will be off. – James Wilkins Apr 11 '15 at 04:24
  • I think what you're really trying to do is "detect the difference between two strings". In which case, you may be interested in [this StackOverflow question of the same title](http://stackoverflow.com/questions/18050932/detect-differences-between-two-strings-with-javascript). – gfullam Apr 11 '15 at 04:27
  • OK. Thanks for pointing out the problem with the code. Sorry about that. But my question is: given two strings, is there a way using the regex to find the difference? – techguy2000 Apr 11 '15 at 04:28
  • With RegEx, you could come up with an expression to specifically detect extra dots, or extra dollars signs, or extra digits, etc.; It is used for pattern matching, not general comparison. – gfullam Apr 11 '15 at 04:30
  • regex is for seeing if a string matches a pattern. You can't use it to compare strings. – Millie Smith Apr 11 '15 at 04:31
  • Finally I am getting comments on regex. Thanks! I hope more regex experts can share their thoughts! – techguy2000 Apr 11 '15 at 04:33
  • 2
    Can you give multiple examples with more than just a one character difference? It's still unclear what you want. – Millie Smith Apr 11 '15 at 04:33
  • I'm joining Millie's cause. It's not clear at all. Point, no point? Or *any* string? – Roko C. Buljan Apr 11 '15 at 04:34
  • If you want to learn about regex, read this: http://en.m.wikipedia.org/wiki/Regular_language. It's valuable to know but won't do you any good here. – Millie Smith Apr 11 '15 at 04:36
  • Let's say str1 = "test xyz". And str2 = "test ab xyz". I want " ab" back. Or "ab " back. The idea is the between the 2 strings, the difference is " ab". "test" + " xyz" = "test xyz" and "test" + " ab" + " xyz" = "test ab xyz". – techguy2000 Apr 11 '15 at 04:40
  • Is it always in money format? First you ask for difference of strings, then just for the extra '.', and now it seems the difference between specific string values. You need to be more clear on what the input can be. You CANNOT compare TWO strings using regex alone. You need to run regex on each string to test it, or break it up - which you were doing already with 'split()'. – James Wilkins Apr 11 '15 at 04:43
  • If you have abcd and bde, what do you want to the result to be? If you have bd and dbe, what is your desired result? What about if we swap the strings in the examples to where string 1 is string 2 and string 2 is string 1? – Millie Smith Apr 11 '15 at 04:48
  • I cannot imagine why you want to use regex so bad here. It sounds like you just need to walk both strings from left to right at the same time, increasing string 2's index when the current characters are different, and increasing both indices when the characters are the same. whenever you just move string 2's index forward, append the character from string 2 to your result. Even if you *could* somehow do this with regex, it would be a less readable solution. – Millie Smith Apr 11 '15 at 05:25
  • I am hoping some regex expert will tell me this is just str = str.replace(someRegex, somestuff); If so, this will be just one line right? – techguy2000 Apr 11 '15 at 05:30
  • It doesn't take much to be a regex "expert". For the sake of this argument, let's assume I'm an expert, whether I am or not. No, you cannot do this in one line with a regex. Impossible. Done. If the answer to a question is "no", but you wait around for someone who has more expertise than every answerer before to tell you the answer is yes, then you will wait forever, because the answer is no. – Millie Smith Apr 11 '15 at 05:42
  • 1
    If you need me to prove my "expertise", regexes in their basic form describe a regular language, accepting or rejecting strings in that language. I have covered regular languages as well as similar languages and ways to write the code to recognize these languages in multiple classes in undergrad and at least once in grad school. This is getting ridiculous. – Millie Smith Apr 11 '15 at 05:45
  • @MillieSmith Thanks Millie. I just needed to know if it's possible or not with regex. Sometimes regex is like magic to me :) – techguy2000 Apr 11 '15 at 05:49
  • 1
    Ok. yw. Go read up on regular languages and it won't be magical any more – Millie Smith Apr 11 '15 at 05:50
  • I have used regex every now and then. For some reason, it just doesn't stick very well with me... But I am still fascinated by it. I will have to read it up more... – techguy2000 Apr 11 '15 at 05:54
  • @techguy2000 Finally regex is possible. Look at my answer – Lorenz Meyer Apr 11 '15 at 07:15

4 Answers4

22

Regexes only recognize if a string matches a certain pattern. They're not flexible enough to do comparisons like you're asking for. You would have to take the first string and build a regular language based on it to recognize the second string, and then use match groups to grab the other parts of the second string and concatenate them together. Here's something that does what I think you want in a readable way.

//assuming "b" contains a subsequence containing 
//all of the letters in "a" in the same order
function getDifference(a, b)
{
    var i = 0;
    var j = 0;
    var result = "";

    while (j < b.length)
    {
        if (a[i] != b[j] || i == a.length)
            result += b[j];
        else
            i++;
        j++;
    }
    return result;
}

console.log(getDifference("test fly", "test xy flry"));

Here's a jsfiddle for it: http://jsfiddle.net/d4rcuxw9/1/

Millie Smith
  • 4,536
  • 2
  • 24
  • 60
  • I see. j is the index for b, and i for a. You are looping through the longer string and storing the "not found/different" char in result. I like it. Since regex is not possible, I'll mark this as my preferred answer. Thanks Millie! – techguy2000 Apr 11 '15 at 06:20
  • 1
    I know that I'm extremely late and this question is closed, but just in case someone wants to find the difference between two strings regardless of the order of the characters: https://jsfiddle.net/c8xchkxq/ – Pedro Corso Apr 19 '17 at 15:15
  • 1
    Nice and simple solution, thanks! I needed the same on word level, and wanted to also receive the positions of the added words. If someone else is interested, see: http://jsfiddle.net/409doc37/ – Heribert Oct 11 '19 at 08:31
1

I find this question really interesting. Even though I'm a little late, I would like to share my solution on how to accomplish this with regex. The solution is concise but not very readable.

While I like it for its conciseness, I probably would not use it my code, because it's opacity reduces the maintainability.

var str1 = "test xyz",
    str2 = "test ab xyz"
    replacement = '';
var regex = new RegExp(str1.split('').map(function(char){
    return char.replace(/[.(){}+*?[|\]\\^$]/, '\\$&');
}).join('(.*)'));
if(regex.test(str2)){
    for(i=1; i<str1.length; i++) replacement = replacement.concat('$' + i);
    var difference = str2.replace(regex, replacement);
} else {
    alert ('str2 does not contain str1');
}

The regular expression for "test xyz" is /t(.*)e(.*)s(.*)t(.*) (.*)x(.*)y(.*)z/ and replacement is "$1$2$3$4$5$6$7".

The code is no longer concise, but it works now even if str1 contains special characters.

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
  • I first thought it was limited to 10 characters for str1. But I just learnt that Javascript allow for back references with numbers larger than 9. – Lorenz Meyer Apr 11 '15 at 07:08
  • This doesn't find the difference between `test xyz` vs `test xy` and `test{2 spaces}xyz` vs `test xyz`. – James Wilkins Apr 11 '15 at 07:34
  • @LorenzMeyer I am pretty excited. I think you are on to something. But when I use var str1 = "$1.00", str2 = "$1..00", it's not finding the dot. I hope you can come up with a robust solution -- so you are dynamically construction the regex based on the str1 input, interesting... – techguy2000 Apr 11 '15 at 07:45
  • Yes, it does not find the dot, because a dot is a special character in regexes. It would not work for `(){}+*\[]`neither. For a robust solution, we need to escape all of those special characters. – Lorenz Meyer Apr 11 '15 at 07:52
  • @james str1 must be contained in str2. Your examples do work with `var str1 = 'test xy', str2 = 'test xyz';` and `var str1 = 'test xyz', str2 = 'test xyz';`. Was this the down vote? – Lorenz Meyer Apr 11 '15 at 08:31
  • @techguy Did you see? I modified the code to work alway. – Lorenz Meyer Apr 11 '15 at 10:38
  • I'll upvote. You did what OP wanted. – Millie Smith Apr 11 '15 at 14:24
  • 1
    @LorenzMeyer Did you mean you updated to the code to handle special characters like dot or dollar sign? I tried dot and dollar sign and the code is not working. http://jsfiddle.net/mnzhbz7o/ – techguy2000 Apr 11 '15 at 15:58
  • How about swapping the strings based on length, and putting the smallest string first, then escaping the special characters, like $ and . – James Wilkins Apr 11 '15 at 16:53
  • @techguy2000 Too bad, I missed the special characters `^` and `$`. I hope, now I've got all of them. http://jsfiddle.net/mnzhbz7o/1/. – Lorenz Meyer Apr 12 '15 at 06:15
  • @LorenzMeyer Thanks for trying! I think you have definitely pointed to me to the right direction with regex. My understanding thus far is that this can't be solved by a simple static regex. The regex would have to be dynamically generated based on the search string. Is that right? And while your code works for simple cases, it breaks here http://jsfiddle.net/mnzhbz7o/2/ But I think with more tweaking, it might work... I'll have to study what you have done :) – techguy2000 Apr 12 '15 at 06:43
  • 'it breaks'. It works, but does not have a special behavior on word boundaries. I didn't consider this as a requirement. If you change `(.*)` to `(.*?)`, it will work for your example, but you'd find other cases where it doesn't. – Lorenz Meyer Apr 12 '15 at 07:48
  • About dynamic regexes : I already used this, but it can get difficult to read and maintain. On the other hand, it can be a powerful tool. – Lorenz Meyer Apr 12 '15 at 07:52
-2

To find out if there are extra '.' like you are asking for, you can do this:

result = "$1...00".match(/\$1\.(\.*)?00/)[1];

result is then the EXTRA '.'s found. You cannot use regex to compare strings using only regex. Perhaps use this, then compare the results.

You can also try this:

result = "$1...00".match(/(\$)(\d+)\.(\.*)?(\d+)/);
// Outputs: ["$1...00", "$", "1", "..", "00"]

Which will extract the various parts to compare.

James Wilkins
  • 6,836
  • 3
  • 48
  • 73
-2

If you are only concerned with testing whether a given string contains two or more sequential dot '.' characters:

var string = '$1..00',
    regexp = /(\.\.+)/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

If you need it to match the currency format:

var string = '$1..00',
    regexp = /\$\d*(\.\.+)(?:\d\d)+/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings.

So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".

The StackOverflow tag wiki provides an excellent overview and basic reference for RegEx. See: https://stackoverflow.com/tags/regex/info

Community
  • 1
  • 1
gfullam
  • 11,531
  • 5
  • 50
  • 64
  • 1
    The question was about comparing two strings, not just removing a string. – Lorenz Meyer Apr 11 '15 at 06:05
  • @LorenzMeyer See above where I explained: 'But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings. So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".' – gfullam Apr 12 '15 at 05:10
  • @LorenzMeyer Also note my early comments on the OP's question above, where it is also worth noting that the question was refined multiple times during which an insistance on a RegEx solution specifically for the "multiple dots" pattern was conveyed. The question was later put on hold for being unclear. – gfullam Apr 12 '15 at 05:12
  • @LorenzMeyer Lastly, see [When should I vote down?](http://stackoverflow.com/help/privileges/vote-down) Where one is instructed to "use your downvotes whenever you encounter an egregiously sloppy, no-effort-expended post, or an answer that is clearly and perhaps dangerously incorrect." Considering this is a good-faith effort to provide a working solution to a specifically asked for portion of the OP's unclear question with a clear explanation, I am surprised you found it to be egregiously sloppy, no-effort-expended and perhaps dangerously incorrect. – gfullam Apr 12 '15 at 05:19
  • @gfullam I tried to vote it up but I don't have enough reputations :) However, my question has always been using regular expression to find the difference in two strings -- it's in the title. The first example I gave was about str1="$1.00" while str2="$1..00". So I think that's the confusion. Finding a double . is simple, but what I am really interested from day 1 is to some how replace $1.00 WITHIN $1..0, so that only a . is left. I also got a down vote for asking a question -- I don't really think much of it :) And I know you are trying to help. Thanks! – techguy2000 Apr 12 '15 at 06:50