1

I have this string:

rder=3D"0" width=3D"650">=0D=0A <tr>=0D=0A <td valign=3D"top">=0D=0A <p>=0D=0A <strong>Hi Mike Tyson</strong>,<br/>=0D=0A =

I want to extract Mike Tyson from the string. Everything but the name is always the same in the above string, so my first clue would be to just use this regex:

[^rder=3D"0" width=3D"650">=0D=0A <tr>=0D=0A <td valign=3D"top">=0D=0A <p>=0D=0A <strong>Hi ].*[^<\/strong>,<br\/>=0D=0A =]

However, this outputs Mike Ty instead of Mike Tyson. Any ideas?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
JohnSmith1976
  • 536
  • 2
  • 12
  • 35
  • can you give the full html code for this? – Arup Rakshit Jul 19 '13 at 12:32
  • You don't have understand the use of the character classes. Take a look at this link:http://www.regular-expressions.info/charclass.html – Casimir et Hippolyte Jul 19 '13 at 12:43
  • 2
    This very popular question and first answer summarise generally how regex-ing into HTML is received: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags - TL;DR: you'll get away with it in a few places, but as soon as it starts to cost you time to debug, cease and get an HTML parser (Ruby's `nokogiri` is good) to do your work – Neil Slater Jul 19 '13 at 12:47

1 Answers1

1

The square brackets make the entire match a character class

This expression will match Mike Tyson

(?<=rder=3D"0" width=3D"650">=0D=0A <tr>=0D=0A <td valign=3D"top">=0D=0A <p>=0D=0A <strong>Hi ).*?(?=<\/strong>,<br\/>=0D=0A =)

Live Example: http://www.rubular.com/r/OaK2ZmbSPh

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43