0

i have page in that approx 10-15 links are there and all links are in my control and end with some words like celebrity i want to filter all links ending with that word so i have written this

    $regex='|<a.*?href="(.*_celebrity)"|';


    preg_match_all($regex,$result21,$parts);
$links=$parts[0];
foreach($links as $link){
{
    echo $link."<br>";
    mysql_query ("INSERT INTO tablea(linkssas) VALUES ('$link')");
    }

it does the job and filters all links which is ending with _celebrity but the output is not entering in database.all links are entering in one row and it is not plain it is in the form of anchor text but i want plain links in the database as i am using foreach so all links should be entered in seperate row but all rows are entering in single row and in the form of anchor like http://xyz.com/edje/jjeieied_celebrity">A</a>

but i want only links in database

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
james
  • 31
  • 1
  • 6
  • You should not use a regex to get the links, but DOMDocument instead. Please read: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Bgi Feb 07 '13 at 14:49
  • 2
    Sounds like a problem with a greedy regex. Really you want `href="(.*?_celebrity)"`, but _really_ you are better off using a proper DOM parser like DOMDocument or SimpleXML for this. – Michael Berkowski Feb 07 '13 at 14:49
  • This sounds like a job for Tony The Pony..... Or better yet, [read this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), it's a good explaination about why you shouldn't try to parse HTML using regex. – SDC Feb 07 '13 at 14:55

2 Answers2

3

I felt obliged to give you the DOMDocument tour:

$d = new DOMDocument();
$d->loadHTML($result21);

$suffix = "_celebrity"; $suffix_len = strlen($suffix);

foreach ($d->getElementsByTagName('a') as $link) {
    $href = $link->getAttribute('href');
    if ($href && substr($href, -$suffix_len) === $suffix) {
        // do your insert here
    }
}

Or, using XPath instead of getElementsByTagName:

$xp = new DOMXPath($d);

foreach($xp->query('//a[substring(@href, string-length(@href) - 9) = "_celebrity"]') as $node) {
    echo $node->getAttribute('href');
}

And here's a message from our chat room:

Please, don't use mysql_* functions in new code. They are no longer maintained and are officially deprecated. See the red box? Learn about prepared statements instead, and use PDO, or MySQLi - this article will help you decide which. If you choose PDO, here is a good tutorial.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • And do the insert with PDO using bindParam – Bgi Feb 07 '13 at 14:57
  • Or maybe better even xpath: [How to use XPath function in a XPathExpression instance programatically?](http://stackoverflow.com/questions/402211/how-to-use-xpath-function-in-a-xpathexpression-instance-programatically) – hakre Feb 07 '13 at 15:05
  • @jack not working i am trying this added few lines to echo $op7=''.$link->getAttribute('href').''; echo $op7; – james Feb 07 '13 at 15:13
0

You probably want to loop through $parts[1] instead of $parts[0].

http://php.net/manual/en/function.preg-match-all.php

Bgi
  • 2,513
  • 13
  • 12
  • I had to modify the regex, but this is bad practice to use a regex in that case. This is also bad practice to use mysql_query() – Bgi Feb 07 '13 at 14:55