1

Here's the code I'm running.

Basically I scrape data, and place them into simple POCO classes. At the end of the loop I want to add the $newItem object to the $parsedItems array. I'm new to PHP, can this be a scoping issue?

<h1>Scraper Noticias</h1>

<?php

include('simple_html_dom.php');

class News {
    var $image;
    var $fechanoticia;
    var $title;
    var $description;
    var $sourceurl;

    function get_image( ) {
        return $this->image;
    }

    function set_image ($new_image) {
        $this->image = $new_image;
    }

    function get_fechanoticia( ) {
        return $this->fechanoticia;
    }

    function set_fechanoticia ($new_fechanoticia) {
        $this->fechanoticia = $new_fechanoticia;
    }

    function get_title( ) {
        return $this->title;
    }

    function set_title ($new_title) {
        $this->title = $new_title;
    }

    function get_description( ) {
        return $this->description;
    }

    function set_description ($new_description) {
        $this->description = $new_description;
    }

    function get_sourceurl( ) {
        return $this->sourceurl;
    }

    function set_sourceurl ($new_sourceurl) {
        $this->sourceurl = $new_sourceurl;
    }
}

// Create DOM from URL or file
$initialPage = file_get_html('http://www.uvm.cl/noticias_mas.shtml');


// Declare variable to hold all parsed news items.
$parsedNews = array();

// Since the University blog page has 262 pages, we'll iterate through that.
for ($i = 2; $i <= 5; $i++) {
    $url = "http://www.uvm.cl/noticias_mas.shtml?AA_SL_Session=34499aef1fc7a296fb666dcc7b9d8d05&scrl=1&scr_scr_Go=" . $i;
    $page = file_get_html($url);
    parse_page_for_news($page);
}

echo "<h1>Final Count:" . count($parsedNews) . "</h1>";

// Function receives an HTML Dom object, and the library works against that single HTML object.
function parse_page_for_news ($page) {

    foreach($page->find('#cont2 p') as $element) {

        $newItem = new News;

        // Parse the news item's thumbnail image.
        foreach ($element->find('img') as $image) {
            $newItem->set_image($image->src);
            //echo $newItem->get_image() . "<br />";
        }

        // Parse the news item's post date.
        foreach ($element->find('span.fechanoticia') as $fecha) {
            $newItem->set_fechanoticia($fecha->innertext);
            //echo $newItem->get_fechanoticia() . "<br />";
        }

        // Parse the news item's title.
        foreach ($element->find('a') as $title) {
            $newItem->set_title($title->innertext);
            //echo $newItem->get_title() . "<br />";
        }

        // Parse the news item's source URL link.
        foreach ($element->find('a') as $sourceurl) {
            $newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
        }

        // Parse the news items' description text.
        foreach ($element->find('a') as $link) {
            $link->outertext = '';
        }

        foreach ($element->find('span') as $link) {
            $link->outertext = '';
        }

        foreach ($element->find('img') as $link) {
            $link->outertext = '';
        }

        $newItem->set_description($element->innertext);

        // Add the newly formed NewsItem to the $parsedNews object.
        $parsedNews[] = $newItem;

        print_r($newItem);
        echo "<br /><br /><br />";

    }
} 

?>

In my current understanding of the language, since the $parsedItems object is declared outside of the function, shouldn't it correctly be added?

Why would my count() call return 0, as if it had no objects in it?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Only Bolivian Here
  • 35,719
  • 63
  • 161
  • 257
  • You probably mean `$parsedNews`. There's no `$parsedItems` in your code. – Madara's Ghost Jul 27 '12 at 19:04
  • As an aside, you don't need to manually write setters and getters if you're only using them as POCO-ish properties. PHP has what are known as 'magic methods' that dynamically intercept when someone is trying to access object code. Take a look here: http://stackoverflow.com/questions/6550550/dumb-experiement-creating-c-esque-properties-in-php – Major Productions Jul 27 '12 at 19:04

5 Answers5

5

It is indeed a scoping issue. This will not work:

$foo= array();

function bar()
{
    $foo[] = 'baz';
}

bar();
var_dump($foo); // will output an empty array

What you want to do is the following:

$parsedNews = array();

// loop through the items as you are doing now
for ($i = 2; $i <= 5; $i++) {
    $url = "http://www.uvm.cl/noticias_mas.shtml?AA_SL_Session=34499aef1fc7a296fb666dcc7b9d8d05&scrl=1&scr_scr_Go=" . $i;
    $page = file_get_html($url);
    $newItems = parse_page_for_news($page);

    $parsedNews = array_merge($parsedNews, $newItems);
}

And have the parse_page_for_news function return the parsedNews after the loop is finished:

return $parsedNews;

Please never ever use the global keyword and please don't pass by reference unless you have a really good reason.

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
3

No. You're misunderstanding the concept of Variable Scope.

Consider the following:

$foo = "bar";

function change_foo($new_foo) {
    $foo = $new_foo;
}

change_foo("New Foo!");
echo $foo;

The output in this case, would still be "bar". That's because the $foo inside of change_foo() is contained inside of the function scope.

If we were to, however, do something like this (the proper way):

$foo = "bar";

function change_foo($new_foo) {
    $foo = $new_foo;
    return $foo;
}

$foo = change_foo("New Foo!");
echo $foo;

The result would be indeed New Foo!.

Another (less recommended) way of doing this, is the following:

$foo = "bar";

function change_foo(&$old_foo, $new_foo) {
    $old_foo = $new_foo;
}

change_foo($foo, "New Foo!");
echo $foo;

The reason this is not recommended, is because it's not obvious from the code that $foo is changed (of course, because of the proper name I gave the function, it seems obvious enough).

The worst way of doing this, is to move $foo into the global state.

$foo = "bar";

function change_foo($new_foo) {
    global $foo;
    $foo = $new_foo;
}

change_foo("New Foo!");
echo $foo;

By globalizing the $foo variable, anyone and everyone in the function can access and change it. If the function's name wasn't so obvious, we could never have known it changed the value of $foo at all!

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
2

In my current understanding of the language, since the $parsedItems object is declared outside of the function, shouldn't it correctly be added?

Nope, you'll need to pass it into the function, just like with C#.

Major Productions
  • 5,914
  • 13
  • 70
  • 149
0

Though you could just add

global $parsedNews

inside you function declaration. I would think it better coding practice to pass the item to the function by reference if you need to be able to modify it and have the modified value reflected in global scope. So you could simply change you function signature to this

function parse_page_for_news ($page, &$parsedNews)
Mike Brant
  • 70,514
  • 10
  • 99
  • 103
-6

Your News object has no brackets. It should be a constructor, like this:

$newItem = new News();

Also, your News class needs a constructor. I am not sure if by not declaring one, it will be automatically given a default constructor (that is, without any arguments)

Chris Baker
  • 49,926
  • 12
  • 96
  • 115
bryan.blackbee
  • 1,934
  • 4
  • 32
  • 46
  • 2
    you don't need parenthesis if you are not going to pass parameters to a constructor function, or if a constructor function has default values it can use. – Mike Brant Jul 27 '12 at 18:52
  • 1
    parenthesis are not strictly required for object instantiation – orourkek Jul 27 '12 at 18:52
  • 3
    also, re: your edit - the class doesn't strictly *need* a constructor either. – orourkek Jul 27 '12 at 18:54
  • I fixed the typos, but FYI this answer is not correct. Also, to address the question you raised in your "answer", classes without a constructor simply... don't have a constructor. There isn't a "default" one, nor does a class require one. – Chris Baker Jul 27 '12 at 19:00