0

I want to match with a pattern all anchor tags containing "goto" in the href attribute.I am using php. I want to match links like:

<a  href='http://www.mysite.com/goto/profile'>links </a>

I have written a regex like:

<a.*(href).*(goto).*<\/a>

It is working for all links like above. But if there is a new line in the anchor tag then it does not match. E.g:

 <a  href='http://www.mysite.com/goto/profile'>
links </a>

It does not match because of the new line. I need a regular expression to match links with and without new lines.

Awlad Liton
  • 9,366
  • 2
  • 27
  • 53
  • 1
    What language/tool/whatever are you using for “executing” your regex? – CBroe Nov 04 '13 at 12:16
  • Generally [using regexes to parse HTML is a bad idea](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not). In your specific case, though, you can probably get away with it using the multiline flag. – Dan Nov 04 '13 at 12:18
  • 1
    Maybe just use a parser instead of processing with regexp. It can be easier. You don't mention what platform you are on or one could be suggested – Vorsprung Nov 04 '13 at 12:18
  • Please see edited question again.I am using php – Awlad Liton Nov 04 '13 at 12:19

4 Answers4

1

You're looking for the "dot all" modifier /s.

From the manual:

/s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Hence, /.*/ will create new matches per line while /.*/s will match all lines.

DEMO

DEMO (without DOTALL)

h2ooooooo
  • 39,111
  • 8
  • 68
  • 102
1

You can use the regex:

<a.*(href).*(goto).*([\n]*.*)*<\/a> 

For parsing html it is advisable to use a html parser rather than regex. Depending on the language there are various html parsers available. eg: in python you have beautifulsoup.

Nikhil Titus
  • 244
  • 1
  • 3
  • 8
0

Use <a(.|\n)*(href).*(goto)(.|\n)*<\/a> to allow multiple lines.

Naveed S
  • 5,106
  • 4
  • 34
  • 52
0

If u want the multiple lines only in the anchor tag, as you described it, do it like that:

<a.*(href).*(goto).*(>)(.|\n)*<\/a>

A Smart Testing tool for PHP can be found here: PHP LIVE REGEX TESTER

CodeFanatic
  • 11,434
  • 1
  • 20
  • 38