1

With this code

<?php
 /*GET ALL LINKS FROM http://www.w3schools.com/asp/default.asp*/ 
$page = file_get_contents('http://www.codacons.it/rassegna_quest.asp?idSez=14'); 
preg_match_all("/<a.*>(.*?)<\/a>/", $page, $matches, PREG_SET_ORDER); 
echo "All links : <br/>"; 
foreach($matches as $match){ 
    echo $match[1]."<br/>"; 
} 
?> 

But it not parse this link from this page http://www.codacons.it/rassegna_quest.asp?idSez=14

'Questionario': OFFICINE PER L'ASSISTENZA E MANUTENZIONI VEICOLI
'Questionario': RIVENDITORE AUTO USATE
'Questionario': RACCOLTA RICICLATA DEI RIFIUTI DI IMBALLAGGI IN PLASTICA
'Questionario': DONNE E POLITICA

Why ???

Rusty Fausak
  • 7,355
  • 1
  • 27
  • 38
Mimmo
  • 157
  • 2
  • 12

2 Answers2

1

I guess I should start with the typical "Don't parse HTML with regex". This would be easy with XPath (using DOMXpath):

$dom = new DOMDocument();
@$dom->loadHTML($page);
$dom_xpath = new DOMXPath($dom);
$entries = $dom_xpath->evaluate("//a");
foreach ($entries as $entry) {
    print $entry->nodeValue;
}

But if you must go the regex route, I imagine the greedy star .* is the source of your problems. Try this:

preg_match_all("@<a[^>]+>(.+?)</a>@/", $page, $matches, PREG_SET_ORDER);
Community
  • 1
  • 1
Rusty Fausak
  • 7,355
  • 1
  • 27
  • 38
0

Ah, whatever...

$page = file_get_contents('http://www.codacons.it/rassegna_quest.asp?idSez=14');
preg_match_all('#<a href="articolo(.*?)" title="Dettaglio notizia">(.*?)</a>#is', $page, $matches);

$count = count($matches[1]);
for($i = 0; $i < $count; $i++){
    echo '<a href="articolo'.$matches[1][$i].'">'.trim(strip_tags(preg_replace('#(\s){2,}#is', '', $matches[2][$i]))).'</a>'; 
}

Result:

<a href="articolo.asp?idInfo=138400&amp;id=">'Questionario':OFFICINE PER L'ASSISTENZA E MANUTENZIONI VEICOLI</a>
<a href="articolo.asp?idInfo=138437&amp;id=">'Questionario':RIVENDITORE AUTO USATE</a>
<a href="articolo.asp?idInfo=127900&amp;id=">'Questionario':RACCOLTA RICICLATA DEI RIFIUTI DI IMBALLAGGI IN PLASTICA</a>
<a href="articolo.asp?idInfo=138861&amp;id=">'Questionario':DONNE E POLITICA</a> 
jotik
  • 17,044
  • 13
  • 58
  • 123
Dejan Marjanović
  • 19,244
  • 7
  • 52
  • 66