How to get a Divs content using RegEx or SubString?

Question

I have ran into a problem trying to use Regex to copy a string out of an Html string and its not giving me what I need.

I am trying to get the table out of a Div and I have tried this with RegEx

string strLook = respData2;
string s = Regex.Match(strLook, "<div id='results'(.+)</div>", RegexOptions.Singleline).Groups[1].Value;

this is just giving me the whole Html string.

I have also tried using SubString

int starPos = strLook.LastIndexOf("<div id='results'>") + "<div id='results'>".Length + 1;
int length = strLook.IndexOf("</div>") - starPos;
string sub = strLook.Substring(starPos, length);

When I step through the SubString, it says that the last index is 18, well I know that the Div is not 18 characters away from the beginning (unless I am wrong what the 18 is for), this isn't even getting the Div either.

So how do I get the contents of the Div, which is a html table, so I can write it to a file.

Thanks

I see two problems at a glance: 1. Using regex to parse HTML 2. Using the greedy quantifier `.+` — Jerry, Sep 03 '14 at 12:52
This is the age-old 'parsing HTML with regex' subject. Google it, it'll become quickly apparent that you shouldn't. You may even find yourself at one of SO's infamous answers along the way. — , Sep 03 '14 at 12:52
You should take a look to HTML Agility Pack http://htmlagilitypack.codeplex.com which is a powerfull and light tool to work with HTML — Aymeric, Sep 03 '14 at 12:52
@Aymeric, I just started using HAP the other day, I can use HAP to extract the table, but since I will be having multiple tables, I need to grab the Div Results, place all the ones I need into a single file then use HAP to extract the table. Or maybe I am going about it the wrong way. I can't find any good tutorials on it, and its documentation is nill on the site — Chris, Sep 03 '14 at 12:57
@Jerry, I would rather use SubString than RegEx because from what I hear RegEx can take a long time — Chris, Sep 03 '14 at 12:58
http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack did you go through this SO's post? — Aymeric, Sep 03 '14 at 12:59
There are plenty of links on this topic which are pretty explicit, and honestly I'm almost 100% sure you won't find a better tool than HAP for working with HTML — Aymeric, Sep 03 '14 at 13:02
I like what I am seeing with HAP so far, but I'm going to have about 50 html strings that I need to extract the tables from then put that into a datatable. — Chris, Sep 03 '14 at 13:04
This is the comp sci 101 solution, not the best, but if this is a one time thing and you don't want to use a tool, you need to account for div tags inside the div you want. So make an open-div-count variable and set it to 1. Then compare the indexes of "
", if the open div tag comes first / index is less, add 1 to your open-div-count, else subtract 1. Either way, add the shorter substring to your result, rinse and repeat. You're done when you're open-div-count variable is 0. — Dudeman3000, Sep 03 '14 at 15:12

score 0 · Answer 1 · edited Nov 28 '17 at 05:16

0

Don't parse HTML with Regular Expressions. Use HTML Agility Pack instead.

Old project page: http://htmlagilitypack.codeplex.com/

edited Nov 28 '17 at 05:16

wp78de

18,207
7
43
71

answered Sep 03 '14 at 12:56

Samuel Neff

73,278
17
138
182

Can't find much of any tutorials for HAP, the site has nothing for documentation – Chris Sep 03 '14 at 12:58

How to get a Divs content using RegEx or SubString?

1 Answers1