0

I have difficulties in C# using regular expressions. What I need to do is to find a specific string, and keep only a specific word in that string.

Here is my code :

 reg = new Regex("<td></td><td><Span class=\"abc\"><Span style=\"color:#......;\"><B>(.*?)</td></tr>");

Here is the unique string I want to retrieve, since there can be different colors I put ...... (code color is always 6 characters), and the (.*?) is the specific word that I will want to save.

Then it goes like this :

this.varToSave = reg.Match(data).Value.Replace("<td></td><td><Span class=\"abc\"><Span style=\"color:#......;\"><B>", "").Replace("</td></tr>", "");

I want to erase everything and keep only my word (.*?), but it doesn't work. It only erases the ("", ""). I think it is a problem with the "......" in the replace code, but I don't know how to fix this.

Thanks in advance.

  • 5
    Don't parse HTML with regular expressions. See http://stackoverflow.com/a/1732454/960195 for a humorous explanation. – Adam Mihalcin Feb 28 '12 at 21:48
  • If you don't have to use Regex, is a "Web Scraper" what you're looking for? Perhaps: http://stackoverflow.com/questions/4377355/i-need-a-powerful-web-scraper-library – Jason Feb 28 '12 at 21:51
  • One more link http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1758162#1758162 Have you tried [HTML Agility Pack](http://htmlagilitypack.codeplex.com/) ? – L.B Feb 28 '12 at 21:55
  • Thank you for links I'll check them out. – user1238882 Feb 28 '12 at 22:23

1 Answers1

0

I hope this helps. It will extract the color code for you. Not sure if you wanted to extract it or replace it. Either way this should help:

        var textInput = "<td></td><td><Span class=\"abc\"><Span style=\"color:#......;\"><B>XYZ</td></tr>";
        var reg = new Regex(@"\<B\>(?<myText>.+?)\</td\>\</tr\>$");

        var matches = reg.Matches(textInput);

        Console.WriteLine("Text found was '{0}'", matches[0].Groups["myText"].Value);

Good luck.

Rob Smyth
  • 1,768
  • 11
  • 19
  • Thanks, but maybe you didn't understand what I meant (my english is bad ^^ ). I have an HTML page and I want to retrieve in it a unique word (but I don't know in advance what it is, I just know that it will be surrounded by the html code I posted in my original post). So I don't need an array or collection, since there will be only one match. So I spot that code, and in the middle of it is my keyword I want to save, so I replace everything with "" (in other words, erase) except my keyword. Then there is only my keyword left, and I save it. Basically that's what I want to do. – user1238882 Feb 28 '12 at 22:20