How to write an advanced regular expression?

Question

I'm new to regular expressions (C#). I need to get the brand names out of an HTML document. I'm using

 MatchCollection m1 = Regex.Matches(html,"<td>.+?</td>",RegexOptions.Singleline);

and the result is 108 lines similar to the following. Each containing a different brand name, Acer in this case.

<td><a href=acer-phones-59.php>
<img src="http://cdn2.gsmarena.com/vv/logos/lg_acer.gif" 
width=92 height=22 border=0 alt="Acer"></a></td>
<td><a href=acer-phones-59.php>Acer phones (89)</a></td>

I need the words "acer" only once, and "acer-phones-59.php" only once. How can I adjust my expression in order to get the brand names and reference name from each line. Any help would be greatly appreciated, thank you.

while you are waiting for somebody to write your regex, you should read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — dognose, Sep 15 '15 at 12:54
Use HtmlagilityPack. Although it has some peculiar bugs if you want to manipulate HTML code, it is quite reliable for just Web scraping. — Wiktor Stribiżew, Sep 15 '15 at 12:54
Just FYI: no one will be able to answer your question. Rephrase it, specify how one can detect the elements containing your required texts, and then perhaps, there will come an answer. — Wiktor Stribiżew, Sep 15 '15 at 14:13

score -1 · Answer 1 · answered Sep 15 '15 at 17:22

-1

Regex.Matches( inputString, @"<td>(.|\n)+?href=(.+?)>(.|\n)+?alt="(.+)"", RegexOptions.None )

The answers are in Group2 and Group4.

answered Sep 15 '15 at 17:22

Derek

7,615
5
33
58

How to write an advanced regular expression?

1 Answers1