strpos not searching for string correctly

Question

I am trying to make a simple script that will get the contents of a page and when the Order button for a new server comes in it sends an email to the address specified. Currently, as I am having trouble with it I am just echoing the result.

This is the code I have at the moment:

<?php
$site = file_get_contents('http://www.soyoustart.com/en/offers/sys-ip-2.xml');

$needle = '<class="order-button"';

if (strpos($site, $needle) !== FALSE)
{
  echo 'Found';
}
else
{
  echo 'Not Found';
}

Currently I am getting returned with `Not Found' even though that string exists in the contents of the file. What am I doing wrong?

score 4 · Accepted Answer · edited May 23 '17 at 10:25

You assume that the page contains <class="order". But it doesn't; what it does contain is

<div class="zone-dedicated-availability button" 
     data-actions="orderButton"
     data-ref="142sys5"
     data-cgi="order"></div>

You possibly need a more powerful tool than strpos (no, not regexps).

If you really are sure the structure of the page/CSS is not going to change too much, you can try to extract all "" tags (recognizable with an easy and reasonable regexp: "]+>"), and then check all of them until you find one that contains "orderButton" or something like that. preg_match_all() and array_filter() are probably your friends.

Another very promising possibility is to use a XML library - the URL extension seems to indicate it's possible to access a reasonably structured and well-formed entity tree behind that page. If so, XPath is your friend.

Update

The XML you indicated is not very well formed (it has the non-HTML tags header, footer, and nav; and it has the Italian flag erroneously declared as Flagz/fi instead of Flagz/it, colliding with the Finland flag. Which says the file was not validated and therefore cannot be trusted to work reliably), so

simplexml_load_file($address)
      ->xpath('/div[class="button"][data-actions="orderButton"]');

or something like that (e.g. DOMdocument/DOMXpath), while the correct approach, is nonetheless not going to work off-the-shelf. A more permissive XML library is needed; you can try SimpleDOM.

The DOM approach is usually much better because it's extremely more flexible and does not need awkward 'fixes' to manage things such as the attributes changing their order. Also, several tools collaborate with DOM - for example with Firefox's Firebug extension you can simply grab the XPath off the object. They change their page layout, and instead of guessing how to extract the data you need, you can just open up the page, copy and paste the new XPath, and Bob's your uncle.

Otherwise, the brute force solution described above:

$xml = file_get_contents($url);

// Extract all DIVs with a `class` attribute (maybe `data-actions` would be better?)
preg_match_all('#<div[^>]+class[^>]+>#', $xml, $gregs);

// Accept only those with the appropriate data action
$btns = array_values(
    array_filter(
        $gregs[0],
        function($div) {
            return preg_match('#data-actions="orderButton"#', $div);
        }
    )
);

print_r($btns);

will return (unless $btns is empty, of course)

Array
(
    [0] => <div class="zone-dedicated-availability button" data-actions="orderButton" data-ref="142sys5" data-cgi="order">
)

You can then parse it (with XML too - just add '</div>') to access the attributes such as data-ref:

if (count($btns) != 1) {
    die("No button, or too many buttons");
}

$xml = simplexml_load_string($btns[0] . '</div>');
$attrs = array();
foreach ($xml->attributes() as $key => $value) {
    $attrs[$key] = (string)$value;
}

$ref = $attrs['data-ref'];

print $ref;

This will assign to $ref the value '142sys5'. You can var_dump the $attrs array and see the other attributes, if needed.

Thanks, I just wanted to mention that when I do this I am returned with: `Array ( [0] => )`... — user3481788, Oct 09 '14 at 20:05

score 1 · Answer 2 · answered Oct 09 '14 at 19:46

Save yourself tons of trouble and take a look at DOMDOcument and DOMXPath. Not only is strpos with HTML/XML unreliable, you really shouldn't be parsing/reading HTML/XML without a parser anyways.

So, given the same circumstance/outcome:

<?php
  $site = file_get_content("http://example.com/");

  $dom = new DOMDocument(); // Spin up a new parser
  $dom->loadHtml($site);    // Load your document in

  $domx = new DOMXpath($dom); // XPath (for finding the button easier)
  $query = '//a[@class="order-button"]'; // find all <a class="order-button">
  $orderButtons = $domx->query(query);

  // check for results
  if ($orderButtons.length > 0){
    // found it (or at least 1). Grab it
    $orderButton = $orderButtons->item(0);
  } else {
    // not found
  }

BTW, someone should contact that site and tell them an .xml extension with HTMl content is foolish. ;p

+1 for the *correct* solution. I too suggested that in my answer, but then discovered that their HTML isn't easily parsed. Even the small clever trick of reusing the Italian flag as the Finnish emblem makes DOMparser choke :-( — LSerni, Oct 09 '14 at 20:44

strpos not searching for string correctly

2 Answers2

Update