-2

I need to compare the below strings. The problem I have is the url in both strings will be different every time e.g:

www.google.com
http://www.google.com
google.co.uk!

So contains cannot match the strings because of the URL not matching.

String1 = "This is my string http://www.google.co.uk and that was my url"
String2 = "this is my string google.gr and that was my url"

So I basically want to compare the contents of the string minus the URl, each string can contain different text each time so looking for the URL at the same location each time will not work.

I have searched extensively on here for an answer to this problem, but I was unable to find a working solution.

Thanks in advance

johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • Can you elaborate more on what you consider a match? Does `http://www.google.co.uk` "match" `google.gr` ? – Rob Dec 22 '15 at 22:27
  • If all the text in string one matches the text in string two then its considered a match. String1 = "**This is my string** http://www.google.co.uk **and that was my url**" String2 = "**this is my string** google.gr **and that was my url**" – johnsmith6 Dec 22 '15 at 22:29
  • Possible duplicate of [Get just the domain name from a URL?](http://stackoverflow.com/questions/2154167/get-just-the-domain-name-from-a-url) – johnny 5 Dec 22 '15 at 22:43
  • 2
    It really would help if you explained why you “need” to do this, and what you will be doing with these strings after you compare them. – Dour High Arch Dec 22 '15 at 22:52

2 Answers2

4

Use regular expressions to remove links:

        String string1 = "This is my string http://www.google.co.uk and that was my url";
        String string2 = "this is my string http://google.gr and that was";

        Regex rxp = new Regex(@"http://[^\s]*");
        String clean1 = rxp.Replace(string1, "");
        String clean2 = rxp.Replace(string2, "");

And now you can compare clean1 with clean2. OFC regexp above is just an example it'll just remove url's staring with "http://". You may need something more sophisticated, based on your real data.

Vir
  • 642
  • 3
  • 10
  • Thanks for your reply. This would not work as the URL can be "google.com" without the "http://" and it can also use any TLD. – johnsmith6 Dec 22 '15 at 22:37
  • Well you could try pattern [^\s]+\.[^\s]+ which should match all string parts that have at least one dot inside and also starts and ends with whitespace. But you need to check it against real use cases because it can be too broad this time. – Vir Dec 22 '15 at 22:51
  • This answer does not satisfy the requirements of the question!! – johnsmith6 Jan 16 '16 at 22:33
  • @johnsmith6 Does exactly the same as answer you've accepted :) Also your requirements didn't said "do my job for me and give me exact regular expression I need". – Vir Jan 17 '16 at 02:28
1

Using Regular Expressions:

        Regex regex = new Regex(@"\s((?:\S+)\.(?:\S+))");

        string string1 = "This is my string http://www.google.co.uk and that was my url.";
        string string2 = "this is my string google.gr and that was my url.";

        var string1WithoutURI = regex.Replace(string1, ""); // Output: "This is my string and that was my url."
        var string2WithoutURI = regex.Replace(string2, ""); // Output: "this is my string and that was my url."

        // Regex.Replace(string1, @"\s((?:\S+)\.(?:\S+))", ""); // This can be used too to avoid having to declare the regex.

        if (string1WithoutURI == string2WithoutURI)
        {
            // Do what you want with the two strings
        }

Explaining the regex \s((?:\S+)\.(?:\S+))

1. \s Will match any white space character

2. ((?:\S+)\.(?:\S+)) Will match the url until the next white space character

2.1. (?:\S+) Will match any non-white space character without capturing the group again (with the ?:)

2.2. \. Will match the character ".", because it will always exist in a url

2.3. (?:\S+)) Again, will match any non-white space character without capturing the group again (with the ?:) to get everything after the dot.

That should do the trick...

Gabriel Duarte
  • 974
  • 1
  • 13
  • 28