1

I want to get a certain information of a website. The problem I'm facing is that this certain information is changing maybe a few times a day. This is because the content is dynamic.

The goal of my PHP script is to get the content (dynamic content from a database) in a PHP variable.

I've set up a codepen to show you what I mean: https://codepen.io/anon/pen/XEVpBo
The HTML from the codepen:

<div class="wrapper">
  <div class="some_useless_div">
    <p>Some useless text paragraph.</p>
    <div id="another_useless_div">
      <p>The actual important part is: SOME_DYNAMIC_TEXT what I want to put into a variable. The text around that dynamic text is static text and will not change.</p>
    </div>
  </div>
</div>

Currently, what I do to capture the information is to explode around the dynamic information:

$content = file_get_contents('https://codepen.io/anon/pen/XEVpBo');
$parts = explode('The actual important part is: ', $content); // some text that is left of the information.
$parts2 = explode(' what I want to put into a variable.', $parts[1]); // some text that is right of the information.
$information = $parts2[0]; // AHA! Now we have the information!

However, this really feels like spaghetti code. Isn't there a function that maybe searches for a string and returns that value such as:
$information = search_string('The actual important part is: %s what I want to put into a variable.'); where %s would be the information put into the $information variable.

Again, the code I use (above) works but it really feels like bad code. I'm looking for a clean function of PHP.

Kevin
  • 929
  • 1
  • 8
  • 13
  • 1
    you could also look into using [`DOMDocument`](https://stackoverflow.com/questions/2571232/parse-html-with-phps-html-domdocument) – Kevin Mar 28 '18 at 01:40
  • does the site have an API? are your breaking any terms scraping it? –  Mar 28 '18 at 01:53

1 Answers1

1

maybe you're looking for preg_match ?

test code seems to work fine: https://3v4l.org/6YeSh ,

<?php
$html=<<<'HTML'
<div class="wrapper">
  <div class="some_useless_div">
    <p>Some useless text paragraph.</p>
    <div id="another_useless_div">
      <p>The actual important part is: SOME_DYNAMIC_TEXT what I want to put into a variable. The text around that dynamic text is static text and will not change.</p>
    </div>
  </div>
</div>
HTML;
preg_match('/The actual important part is\: (.*?) what I want to put into a variable\./',$html,$matches);
$str=$matches[1];
var_dump($str);

also, when you're talking about the "best" way, it's definitely not file_get_contents, for at least 2 reasons:

file_get_contents keep reading until the socket is closed by the target server, but should stop reading once content-length bytes has been read, which, depending on the server, might have executed much faster

file_get_contents does not support compressed transfers.

curl reads until content-length bytes have been read, then returns, it also supports compressed transfers, thus curl should run significantly faster than file_get_contents.

(and i disagree, your code is not spaghetti code. i don't think it's good code, because you should have been using preg_match instead of explode(), it's probably faster, use less memory, and easier to write and maintain than your explode code, but your explode code is not spaghetti.)

hanshenrik
  • 19,904
  • 4
  • 43
  • 89