I'm trying to follow a tutorial for web scraping with php.
I understand roughly whats going on, but I don't get how to filter what has been scraped to get exactly what I want. For example:
<?php
$file_string = file_get_contents('page_to_scrape.html');
preg_match('/<title>(.*)<\/title>/i', $file_string, $title);
$title_out = $title[1];
?>
I see that the (.*)
will retrieve everything in between title tags, can I use regular expressions to get specific info. Say inside he title had Welcome visitor #100
how would I get the number that comes after the hash?
Or do I have to retrieve everything between the tags then manipulate it later?