Regex matches not working

Question

I am doing a very simple task: parsing a website, looking for

<tbody>this is what important for me</tbody>`

and returning but I just cannot make it work. When I do:

Regex.Matches(webData, @"<tbody>(.*?)</tbody>")

it gives me no results. This, however, gives me 2 results:

Regex.Matches(webData, @"tbody")

but again, this

Regex.Matches(webData, @"tbody(.*?)tbody")

gives me nothing (so I assume escaping is not the problem). I found about (.*?) at this page and I assumed it will be pretty easy to use, but I just cannot work it out.

Please update your title http://meta.stackexchange.com/questions/10647/how-do-i-write-a-good-title — Soner Gönül, Apr 14 '13 at 18:30
You need to escape some symbols. Try `"\(.*?)\<\/tbody\>"` instead. — Jerry, Apr 14 '13 at 18:30
Perhaps you should see: [This Answer](http://stackoverflow.com/a/1732454/1465011) — recursion.ninja, Apr 14 '13 at 18:31

Anirudha · Accepted Answer · 2013-04-14T18:43:08.227

Using regex for parsing html is not recommended

regex is used for regularly occurring patterns.html is not regular with it's format(except xhtml).For example html files are valid even if you don't have a closing tag!This could break your code.

Use an html parser like htmlagilitypack

You can use this code to retrieve all tbody's content using HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var tbodyList= doc.DocumentNode.SelectNodes("//tBody")
                  .Select(p => p.InnerText)
                  .ToList();

tbodyList contains all tbody values in the entire document!

score 2 · Answer 2 · answered Apr 14 '13 at 18:37

To parse a web page use a real html parser like HtmlAgilityPack

string html = "<tbody>this is what important for me</tbody>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var text = doc.DocumentNode.Descendants("tbody").First().InnerText;

score 0 · Answer 3 · answered Apr 14 '13 at 18:43

0

I recommend HtmlAgilityPack too.

You can use also XPath (http://www.w3schools.com/xpath/)

On the I4V example:

var text = doc.DocumentNode.SelectSingleNode("//tbody").InnerText;

answered Apr 14 '13 at 18:43

briba

2,857
2
31
59

Regex matches not working

3 Answers3