0

I am trying to fetch data from example.com using file_get_contents and preg_match_all but not getting the desired result.

The url is example.com data to be fetched in this url is abc="hello" - i want to fetch hello and store inside a variable so far i have

$url = "example.com";
$pagecontent = file_get_contents($url);
preg_match_all('/abc="([^"]+)"/',$pagecontent ,$m); 
print_r($m);

The result I am getting is

Array ( [0] => Array ( ) [1] => Array ( ) )

When the result should be hello.

DragonFire
  • 3,722
  • 2
  • 38
  • 51

2 Answers2

2

If you have formatted datas, use the structure of your datas, don't use a direct string approach:

$url = "example.com";
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLfile($url);
$xp = new DOMXPath($dom);

$nodeList = $xp->query('//@abc');

foreach ($nodeList as $node) {
    echo $node->nodeValue, PHP_EOL;
}

If you want a more costumed result, look at the XPath query language.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • getting Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in example.com, line: 25 in C:\xampp\htdocs\search\crawl.php on line 21 – DragonFire Mar 27 '17 at 23:49
  • @DragonFire: it isn't important, most of the time, html files aren't well formatted. You can prevent these warnings redirecting all html errors to the libxml error handler. I will add the code to do it. – Casimir et Hippolyte Mar 27 '17 at 23:53
  • You can have an access to these errors using [`libxml_get_errors`](http://php.net/manual/en/function.libxml-get-errors.php) – Casimir et Hippolyte Mar 27 '17 at 23:55
  • ok now getting Invalid argument supplied for foreach() in C:\xampp\htdocs\search\crawl.php on line 48 looking into it – DragonFire Mar 27 '17 at 23:58
  • can we chat.... i can invite you to chat room and work with the real url – DragonFire Mar 28 '17 at 00:10
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139193/discussion-between-dragonfire-and-casimir-et-hippolyte). – DragonFire Mar 28 '17 at 00:11
  • Sorry no, I'm not a chatter, but if you want an answer that fits better your case, add in your question a more descriptive example. – Casimir et Hippolyte Mar 28 '17 at 00:13
  • alright no problem, thanks i am working on the problem now... i guest python seems a lot easier than php for this – DragonFire Mar 28 '17 at 00:14
  • `data-ttlurl=` doesn't exist in the file. `seller-info` exists (30 occurrences). Whatever stop to search stupid solutions with direct string approaches (regex) and take the time to learn how to use and query the DOM. Trying to do it with Python instead of PHP is the same problem and honest users will give you the same answer. – Casimir et Hippolyte Mar 28 '17 at 00:30
  • @DragonFire: Yes! You find the way to do it: https://www.youtube.com/watch?v=_EItANXUklc – Casimir et Hippolyte Mar 28 '17 at 00:39
  • @DragonFire: if you have problems to load the file, use [`libxml_set_streams_context`](http://php.net/manual/en/function.libxml-set-streams-context.php) – Casimir et Hippolyte Mar 28 '17 at 02:05
  • @DragonFire: If it doesn't suffice, use cUrl to know what is the exact problem. – Casimir et Hippolyte Mar 28 '17 at 02:11
  • your code is working fine if you add $url = file_get_contents($url); after the first line, now i am reading about manipulating data using how to use and query the DOM as mentioned by you – DragonFire Mar 28 '17 at 04:12
0

This works for me!

You get the parameter from the url.

$subject = $_GET['abc'];

preg_match_all("/^(([^\"]+))$/", $subject, $matches);

print_r($matches);
Luís Chaves
  • 661
  • 7
  • 15