3

I make a regex pattern and tested in this site : http://rubular.com/

I'm writing this pattern exactly like this to the first box in that site.

<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>

I left the second box empty.

My regex pattern working perfectly fine respect to this site.

But i can't get it working in C#

I'm trying this:

WebClient client = new WebClient();

string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");

string ItemPattern = "<div class=\"product clearfix\">\\n+" +   //  <div class="product clearfix">\n
                "<div class=\"img\">\\n" +                  //  <div class="img">\n
                "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

MatchCollection matches = Regex.Matches(MainPage, ItemPattern);

foreach (Match match in matches)
{
    Console.WriteLine("Area Code:        {0}", match.Groups[1].Value);
    Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
    Console.WriteLine();
}

I simply escaped every " with \ . I really don't understand why it's not working and this starting to drive me crazy..

Trax
  • 943
  • 2
  • 12
  • 30

2 Answers2

4

You need 2 layers of escape sequences. You need to escape once for c# and once more for the regex syntax.

If you want to escape characters for regex have to escape \ too, so you should change your \ to \\ for escape sequences at the regex level

2

use TWO \'s for every single \ in your string. Not counting the escaping you already did for the quotes. Since \ is an escape character. It looks like mainly with "\n" occurring 3 times.

Original String:

"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/

Also, you can break that up into more than one line. c# ignores spaces, so just close the quote and add a "+" to the end of the line, continue by starting with another quote.

C# String:

string ItemPattern = "<div class=\"product clearfix\">\\n" +   //  <div class="product clearfix">\n
                    "+<div class=\"img\">\\n" +                 //  +<div class="img">\n
                    "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                    "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                    "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                    "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

If you still have a problem with it, there is something else wrong, probably in the RegEx.Match(mainPage, ItemPattern). According to the debugging you did, it sounds like the string is successfully being created, and there is no MatchCollection. So it's either in how you are obtaining the matches, or in referencing them.

peege
  • 2,467
  • 1
  • 10
  • 24