1

I need to capture the name of an anchor html tag with regex and php so from text I will get "hello" (the name of the anchor)

Tried that:

$regex  = '/(?<=name\=")#([^]+?)#(?=")/i';  
preg_match_all($regex, $content, $data);
print_r($data);

I've tailed the apache error log to find out that:

PHP Warning: preg_match_all(): Compilation failed: missing terminating ] for character class at offset 26

also tried:

$regex  = '/(?<=name\=")([^]+?)(?=")/i'; 
$regex  = '/(?<=name\=")[^]+?(?=")/i'; 

which are basically the same. I guess I'm missing something, probably a silly slash or something like that but I'm not sure as to what

Will appreciated any help Thanks

SOLVED

Ok, Thanks to @stillstanding and @Gordon I've managed to do that with DOMDocument which is much simple so, for the record, Here is the Snippet

$dom = new DOMDocument;
    $dom->loadHTML($content);
    foreach( $dom->getElementsByTagName('a') as $node ) {
        echo $node->getAttribute( 'name' );
    }
TwoDiv
  • 325
  • 4
  • 13
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Nov 15 '10 at 11:41
  • possible duplicate of [Regular expression for grabbing the href attribute of an A element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element) – Gordon Nov 15 '10 at 11:43
  • Don’t PHP users use `/x` mode so their patterns can be processed in **non-insane mode**? How come? – tchrist Nov 15 '10 at 11:44

4 Answers4

2

Use DOMXPath for this along with DOMDocument or SimpleXML. But never, ever use regex patterns!

bcosca
  • 17,371
  • 5
  • 40
  • 51
  • 1
    It’s perfectly fine to use patterns **IF AND ONLY IF** you have generated the markup language yourself, because you can therefore be 100% guaranteed that it conforms to a particular subset of that markup language. In the general case, it is too hard but in many specific cases, it is perfectly acceptable. – tchrist Nov 15 '10 at 11:43
  • The problem is that I don't get an html page or xml file but only a content block, Its actually whats get from get_the_Content() wordpress function. THe markup is mine, I created it with a wordpress content filter it will always be in the form of something. – TwoDiv Nov 15 '10 at 11:56
  • @TwoDiv doesnt matter. Any of the tools listed in the related link below your question can work with partial HTML. See the closevote question for a working example. Just need to exchange href with name. If you are sure it's always `` match `##`. Should make it ungreedy though. – Gordon Nov 15 '10 at 11:58
  • Thanks @stillstanding and thanks @Gordon - I've managed to get that working, I will update my original posting – TwoDiv Nov 15 '10 at 12:28
1
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?";
preg_match($regex, $yourstring, $result);

e.g.:

$yourstring="somelink.html#this";
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)";
preg_match($regex, $yourstring, $result);
echo substr($result[0],1);

Would return 'test'

However, the parse_rul function is probably a better bet to get this info from an address:

http://www.php.net/manual/en/function.preg-match.php#96339

If you wish to replace the actual anchor tags within a doc, see here

SW4
  • 69,876
  • 20
  • 132
  • 137
  • I think the OP wants the contents of the name attribute of an A element and not the fragment of a URL. – Gordon Nov 15 '10 at 11:47
0

Your [^]+? is a syntax error. What is it supposed to be? A minimal match of 1 or more instances, preferring less, of what thing? If you mean a nonmeta ^, then you should just call it \^. But if you mean any character that is not a ^, you could use [^^], which you may write [^\^] if that seems clearer to you.

If you mean which is not at the beginning of the line, well, that’s somewhat different. You could use a lookbehind negation, perhaps. But more information is needed.

If you are really bound and determined to use a regex to split HTML tags, then you should at least do it properly.

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • I'm pretty lousy in Regex so I took the expration from http://gskinner.com/RegExr/ . It should get the name property. It worked fine on the regex emulator but not on php, I understood it's because php use a slight different regex syntax – TwoDiv Nov 15 '10 at 12:00
  • @TwoDiv: Yes, this is the bane of regexes: that a particular syntax will work, or fail to work, differently in the different applications. Even though most say they are Perl derived, that doesn’t tell anything like the whole story, nor make them mutually compatible, as I see you’ve discovered. Hopefully PHP will catch up to PCRE 8 one of these days, which should help. – tchrist Nov 15 '10 at 12:06
0

Will only work for the exact <a name="[variable]"> string (string, not element. Regex have no clue about elements, nor attributes. They cannot parse HTML). See the links below your question for alternate approaches.

$text = '
    <a name="anything">something</a> blabla
    <span name="something">something</span>  blabla
    <a name="something else">something else</a>  blabla
';

preg_match_all('#<a name="(.*)">#', $text, $matches);
print_r($matches);

gives

Array
(
    [0] => Array
        (
            [0] => <a name="anything">
            [1] => <a name="something else">
        )

    [1] => Array
        (
            [0] => anything
            [1] => something else
        )
)

Marking this CW because topic has been beaten to death

Gordon
  • 312,688
  • 75
  • 539
  • 559