So, say I'm parsing the following HTML string:
<html>
<head>
RANDOM JAVASCRIPT AND CSS AHHHHHH!!!!!!!!
</head>
<body>
<table class="table">
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
</table>
<body>
</html>
and I want to isolate the contents of ** (everything inside of the table class)
Now, I used regex to accomplish this:
string pagesource = (method that extracts the html source and stores it into a string);
string[] splitSource = Regex.Split(pagesource, "<table class=/"member/">;
string memberList = Regex.Split(splitSource[1], "</table>");
//the list of table members will be in memberList[0];
//method to extract links from the table
ExtractLinks(memberList[0]);
I've been looking at other ways to do this extraction, and I came across the Match object in C#.
I'm attempting to do something like this:
Match match = Regex.Match(pageSource, "<table class=\"members\">(.|\n)*?</table>");
The purpose of the above was to hopefully extract a match value between the two delimiters, but, when I try to run it the match value is:
match.value = </table>
MY question, as such, is: is there a way to extract data from my string that is slightly easier/more readable/shorter than my method using regex? For this simple example, regex is fine, but for more complex examples, I find myself with the coding equivalent of scribbles all over my screen.
I would really like to use match, because it seems like a very neat and tidy class, but I can't seem to get it working for my needs. Can anyone help me with this?
Thank you very much!