0

I am trying to extract text from a text file but the length of the text to collect varies in length. This is my first attemt at using RegEx and could use some sugestions

Here is the Source text. I am trying to extract.parse the Name, Email, Birthdat & Phone Number only. Any help would be appreciated.

Basic data
</td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="padding:0;"> </td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #555555; font-family: Arial, Helvetica, sans-serif; font-size:14px;">
Name:
</td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="color: #262626; padding-bottom:8px ; font-family: Arial, Helvetica, sans-serif; font-size:14px;">Test User3</td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #555555; font-family: Arial, Helvetica, sans-serif; font-size:14px;">
Email:
</td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="color: #262626; padding-bottom:8px ; font-family: Arial, Helvetica, sans-serif; font-size:14px;"><span style="color: #262626; text-decoration:none;">testuser3@busystreet.com</span></td></tr><tr><td align="center" colspan="3" height="20" width="100%" style="color: #262626; padding:0; margin:0; line-height:20px;"> </td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #002a5c; font-family: Arial, Helvetica, sans-serif; font-size:14px;">
Custom data
</td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="padding:0;"> </td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #555555; font-family: Arial, Helvetica, sans-serif; font-size:14px;">ref:
</td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="color: #262626; padding-bottom:8px ; font-family: Arial, Helvetica, sans-serif; font-size:14px;">06/16/1963</td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #555555; font-family: Arial, Helvetica, sans-serif; font-size:14px;">cellphone:
                                                            </td><td align="left" width="10" style="padding:0; margin:0;"> </td><td align="left" width="290" style="color: #262626; padding-bottom:8px ; font-family: Arial, Helvetica, sans-serif; font-size:14px;">6152498588</td></tr><tr><td align="center" colspan="3" height="20" width="100%" style="color: #262626; padding:0; margin:0; line-height:20px;"> </td></tr><tr><td align="right" width="250" style="padding-bottom:8px; margin:0; color: #002a5c; font-family: Arial, Helvetica, sans-serif; font-size:14px;">

Thanx in advance,

Doug

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 2
    [You really shouldn't try to use regular expressions to parse HTML.](http://stackoverflow.com/a/1732454/41071) – svick Apr 14 '12 at 16:35

2 Answers2

2

Use the HTML Agility Pack instead. Parsing HTML with regex is a bad thing, except for very specific cases.

David Brabant
  • 41,623
  • 16
  • 83
  • 111
0

Better use SimpleXML instead of regex to parse HTML!

Andreas Linden
  • 12,489
  • 7
  • 51
  • 67