0

need some help with this RegEx magic..

I have this: <a href="/en/node/1032/delete?destination=node%2F5%2Fblog">delete</a>

and this:

(<a)*([^>]*>)[^<]*(</a>)



$1 = <a
$2 = href="/en/node/1032/delete?destination=node%2F5%2Fblog">
$3 = </a>

I need some aditional strings:

  • 1032
  • href="/en/ en is dynamic!

How can I get this strings?

Used in php

user633163
  • 41
  • 1
  • 3

2 Answers2

1

Your sample could be captured with

(<a)\b.*?((href="/en/).*?(?</)(\d+)/.*?").*?>).*?(</a>)

...but perhaps replacing the "en" with something broader, depending on what you want to capture.

HOWEVER, and I want to emphasize this, don't use regex to parse HTML. The above regex won't work for certain HTML-valid input, and due to the limitations of regex it cannot be refined to work for every possible case. You'll get better, more correct results with an HTML or XML parser.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • this is not working .. what do you think is better to use some php string functions? – user633163 Feb 24 '11 at 22:22
  • @user633163 I haven't used PHP in some time, but I think you want the `DOMDocument` class: http://www.phpro.org/examples/Parse-HTML-With-PHP-And-DOM.html and http://php.net/manual/en/class.domdocument.php Use `getElementsByTagName('a')` and grab the `href` attribute. Split the resulting string on the `/` character and you should have what you want. – Justin Morgan - On strike Feb 24 '11 at 22:32
0

([^/ ]). That will give you href=" en node 1032

bluesman
  • 2,242
  • 2
  • 25
  • 35
  • Not sure if I understand your question but you can use the regex to get the groups and get your data. – bluesman Mar 02 '11 at 17:59