0

This regex comes from Atwood and is used to filter out anchor tags with anything other than the href and a title:

 <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")?\s?>

I need to allow am additional attribute that specifically matches: target="_blank". So the following url should be allowed:

 <a href="http://www.google.com" target="_blank">

I tried changing the pattern to these:

 <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")(\starget="_blank")?\s?>
 <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")(\starget=\"_blank\")?\s?>

Clearly I don't know regex very well. How should the pattern be adjusted to allow the blank target and no other targets?

Sailing Judo
  • 11,083
  • 20
  • 66
  • 97
  • You shouldn't use regex to parse HTML: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Max Shawabkeh Mar 16 '10 at 19:53
  • Why thats certainly an interesting answer it seems a little silly to take it literally and in all cases. What I am using it for is a simple sanitization routine meant only to ensure a few basic tags are allowed. Regex certainly seems up to this task even if *I* am not. ;) – Sailing Judo Mar 16 '10 at 20:10

2 Answers2

1
<a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"\s(target=\"_blank\")>

Will do what you are asking.

If you are a regex nub, let me recommend RegExBuddy. It is a program that lets you test your regex's on sample text or sample files.

Saves a lot of time.

http://www.regular-expressions.info/regexbuddy.html (Regex Buddy)

http://www.regular-expressions.info is also a good resource

Jason
  • 11,435
  • 24
  • 77
  • 131
  • Note that this solution imposes that the said attributes (href, target and title) have a specific order. – Felix Mar 16 '10 at 19:55
  • I was using this url to test with but hadn't come up with a pattern that worked. http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx – Sailing Judo Mar 16 '10 at 20:00
1
<a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")(\starget="_blank")>
zellio
  • 31,308
  • 1
  • 42
  • 61