0

I'm trying a web scraping with my virtual web server; I'm looking for the name of projects + the name of creator in the page for example Bring THE PEOPLE TO COME to New York City by Yanira Castro

These information are locaded in bbcard_name

My problem is that the array and csv i receive at the end of the script always are empty...

<?php

set_time_limit(0);

$data = array ()

$listpage = file_get_contents('http://www.kickstarter.com/discover/categories/dance/');

preg_match_all('#<h2> <a href="([A-Z]+)\.html">([a-za-Z ]+)</a></li>#', $listpage, $pagesurl);

    foreach($pageurl[1] AS $pagesurl) {

    $projectPage = file_get_contents('http://www.kickstarter.com/discover/categories/dance/' . $pagesurl . '.html');

    preg_match('#<h2>bbcard_name ([a-zA-Z ]+)</h2>#', $projectPage, $name);
    $name = $name[1];

    preg_match_all('#<h2><a href="https?://.+\.[a-z]{2,5}">([^<]+)</a>#', $projectPage, $namefound);

    foreach($namefound[1] AS $name) {

        if(!isset($data[$name]))

            $data[$name] = array('name' => $name);
        else
            $data[$name]['name'] .= ' - ' . $name;
    }
 }

print_r($data);

$out = fopen('data.csv', 'w'); 
fputcsv($out, array('Titre')); 

foreach ($data as $name => $data) {
    $name = (isset($data['name'])) ? $data['name'] : ''; 
    fputcsv($out, array($data,$name));
}

fclose($out);

echo "FINITO";
 exit;

?>

Thanks

Coquinoob
  • 155
  • 1
  • 1
  • 11
  • 4
    first, ditch the regexes. You do **NOT** parse html with regexes. the slightest malformation in the html and your regexes are toast. Use [DOM](http://php.net/dom). You also have absolutely NO debug output in there whatsoever. Did you check if you actually get some html from kickstarter? Did you check the intermediate stages to see if they're working? – Marc B May 07 '13 at 16:40
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Tivie May 07 '13 at 16:48
  • 1
    you could try this http://simplehtmldom.sourceforge.net/ .Easy and simple – dee May 07 '13 at 16:51
  • As you can see I'm beginning with it, can you give me an example of correction using DOM as an answer or my question can't be answered? – Coquinoob May 07 '13 at 16:56

0 Answers0