1

I've been attempting for the past 4 hours to create a regex to get the information below and add it all to an array that i can run a forloop on. In about 2 hours, if this isn't working, 304 people wont be getting a text message displaying that our school system now has a cancellation.

http://www.wane.com/generic/weather/closings/School_Delays_and_Closings

<tr class="B">
<td width="35%">Blackhawk Christian School</td>

<td width="25%">Allen</td>

<td width="80%">2 Hour Delay&nbsp;</td>
</tr>

<tr class="S">
<td width="35%">Southwest Allen County Schools</td>

<td width="25%">Allen</td>

<td width="80%">2 Hour Delay&nbsp;</td>
</tr>

What I need is a foreach td width="35%" add it to an array with the information of the school system, and the td wdith="80%" information. Because I don't need this for just one school system, I need to check all of them in the list and display it to the user.

I'm doing:

$wanetv = get_url_contents("http://www.wane.com/generic/weather/closings/School_Delays_and_Closings");

To grab the webpage.


EDIT:

Tried to convert some C# posted below into PHP... can't quite figure it out. Here's my attempt:

   $a = "<tr class='B'> <td width='35%'>Blackhawk Christian School</td> <td width='25%'>Allen</td> <td width='80%'>2 Hour Delay&nbsp;</td> </tr> <tr class='S'> <td width='35%'>Southwest Allen County Schools</td><td width='25%'>Allen</td><td width='80%'>2 Hour Delay&nbsp;</td> </tr> ";
    $SchoolNameKeyword = "<td width='35%'>";
    $DelayKeyword = "<td width='80%'>";

    while (strlen(strstr($a, $SchoolNameKeyword))>0)
    {

        $a = substr($a,strrpos($a, $SchoolNameKeyword)+strlen($SchoolNameKeyword));
        $schoolName = substr($a, 0,strrpos( $a, "<"));
        $a = substr($a,strrpos($a, $DelayKeyword) + strlen($DelayKeyword));
        $delay = substr( $a, 0,strrpos( $a, "<"));

        $arr[$schoolName] = $delay;
    }
        print_r($arr);

Prints out:

Array
(
    [Southwest Allen County SchoolsAllen2 Hour Delay  ] => 2 Hour Delay  
)
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
E3pO
  • 493
  • 1
  • 9
  • 21
  • Guessing the regex would be something like R?<1>]*>\s*)", as posted in http://stackoverflow.com/questions/4276498/regex-for-extracting-only-tr-with-tds – E3pO Dec 06 '10 at 11:46
  • 4
    Obligatory plug http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – BoltClock Dec 06 '10 at 11:47

5 Answers5

8

You would really, really, really be better off using an HTML parser here instead of Regular Expressions... especially when you don't control the source, and they could easily break your regex parsing, while HTML parsing would be somewhat more likely to stay working.

Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
  • 4
    This is duckspeak without a proper example. PHP comes with a native DOM parser, and has quite a few additional libraries. Can you demonstrate? – Kobi Dec 06 '10 at 11:54
  • Do you happen to have any examples of reading table information using the html dom api? Possibly http://www.phpro.org/examples/Parse-HTML-With-PHP-And-DOM.html ? – E3pO Dec 06 '10 at 11:56
7

You would really, really, really be better off using an HTML parser here instead of Regular Expressions... especially when you don't control the source, and they could easily break your regex parsing, while HTML parsing would be somewhat more likely to stay working.

- Andrew Barber

Such an example, using PHP's DOM might look something like the following example. However, I would take exception to Andrew's comments about HTML parsing being "somewhat more likely to stay working" as changes in the source HTML may affect it just as much as any regular expression.

$doc = new DOMDocument;

// Temporarily use "internal" XML error handling to keep HTML warnings quiet
libxml_use_internal_errors(true);
$doc->loadHTMLFile('http://www.wane.com/generic/weather/closings/School_Delays_and_Closings');
libxml_use_internal_errors(false);

// Find each <tr> for our schools
$xpath = new DOMXPath($doc);
$rows  = $xpath->query('///h2[.="Schools: ALL"]/following-sibling::table/tbody/tr[count(td) = 3]');

// Build array of name, county and delay information for each school
$schools = array();
foreach ($rows as $row) {
    $tds    = $row->getElementsByTagName('td');
    $school = $tds->item(0)->textContent;
    $info   = $tds->item(2)->textContent;
    $schools[$school] = $info;
}

echo "Found {$rows->length} schools:" . PHP_EOL;
print_r($schools);

The above uses classes/techniques that you are probably not familiar with. Do ask questions.

Community
  • 1
  • 1
salathe
  • 51,324
  • 12
  • 104
  • 132
  • 1
    Thank you very much, :). I will be doing some tests on my own and if i don't understand anything I will indeed ask questions. I've never seen . PHP_EOL to end a line before, I'm also curious about the xpath->query and the complicated mess that looks like. – E3pO Dec 06 '10 at 21:42
1
$a = "<tr class='B'> <td width='35%'>Blackhawk Christian School</td> <td width='25%'>Allen</td> <td width='80%'>2 Hour Delay&nbsp;</td> </tr> <tr class='S'> <td width='35%'>Southwest Allen County Schools</td><td width='25%'>Allen</td><td width='80%'>2 Hour Delay&nbsp;</td> </tr> "; 

$SchoolNameKeyword = "<td width='35%'>"; 
$DelayKeyword = "<td width='80%'>"; 
$schoolNames = array();
$delays = array();

$i = 0;
while (strlen(strstr($a, $SchoolNameKeyword))>0) 
{ 

    $a = substr($a,strrpos($a, $SchoolNameKeyword)+strlen($SchoolNameKeyword)); 
    $schoolName = substr($a, 0,strrpos( $a, "<")); 
    $a = substr($a,strrpos($a, $DelayKeyword) + strlen($DelayKeyword)); 
    $delay = substr( $a, 0,strrpos( $a, "<")); 

    $delays[$i] = $delay; 
$schoolNames[$i] = $schoolName;
} 
for ($i = 0; $i < $delays; $i++) {
    echo "School: " . $schoolNames[$i] . "\n";
    echo "Delay: " . $delays[$i] . "\n";
}
Pabuc
  • 5,528
  • 7
  • 37
  • 52
  • It looks fine? What is the problem? – Pabuc Dec 06 '10 at 12:22
  • ok have 2 arrays instead of 1. Add your items like this to your array: $schoolnames[] = $schoolname; And to print them, loop in your arrays: foreach ($schoolnames as $i => $value) { unset($schoolnames [$i]); } – Pabuc Dec 06 '10 at 12:28
  • Thank you, for some reason it's displaying "Southwest Allen County SchoolsAllen2 Hour Delay" I figured out a way to do it with arrays which i can then grab the array and see the delay information, i updated my edit above. Can you help strip down to just the name? – E3pO Dec 06 '10 at 12:41
  • Seems to put my server into an infinite loop. – E3pO Dec 06 '10 at 13:12
  • Try echoing the values you put into array, echo something when you get out of the while loop, and echo the value of $i in the last for loop, I'm not testing what I'm writing here and I'm not good at php but this is the right way of doing it and you should be close to finding the solution, just try your best to see the values.. – Pabuc Dec 06 '10 at 13:14
0

Are you sure regex is the best way to solve this problem? What about using some kind of HTML DOM API to traverse the table?

Rune Aamodt
  • 2,551
  • 2
  • 23
  • 27
0

Using phpQuery/QueryPath is the simplest option. It's doable with regular expressions, but difficult to get right for newcomers.

A good alternative is to just use an HTML <table> to array conversion class. Since your data is already in a useful structure the workaround over DOM nodes seems wacky. There are some quick to google examples:

mario
  • 144,265
  • 20
  • 237
  • 291