0

I'm trying to capture the content of a div from an html page with this code:

$content = file_get_contents('http://player.rockfm.fm/');

$content = preg_replace("/\r\n+|\r+|\n+|\t+/i", " ", $content);

preg_match('/<div id=\"metadata_player\">(.*?)<\/div>/', $content , $matchs);

print_r($matchs);

The result is empty, because that code is generated by javascript or ajax. Is there any other way than using https://github.com/neorai/php-webdriver?

Solution:

    $result = file_get_contents("http://bo.cope.webtv.flumotion.com/api/active?format=json&podId=78");
    $array_full=(json_decode($result, true));
    $symbols = array('"','}','{');
    $array_full['value'] = str_replace($symbols, "", $array_full['value']);

    $array_author_title= explode(",", $array_full['value']);
    $array_author = explode(":", $array_author_title[1]);
    $array_title = explode(":", $array_author_title[2]);

    echo "Author: ".$array_author[1];
    echo "</br>Title: ".$array_title[1];

thanks to: @urban and How to use cURL to get jSON data and decode the data?

rai
  • 13
  • 5
  • Why not capture it with Javascript! See [phantomjs](http://phantomjs.org) – urban Nov 10 '17 at 16:06
  • Throw this away `$content = preg_replace("/\r\n+|\r+|\n+|\t+/i", " ", $content);` Change this `'/
    (.*?)<\/div>/'` to `'/(?s)
    (.*?)<\/div\s*>/'`
    –  Nov 10 '17 at 16:21

1 Answers1

0

This page is loading weirdly (Seems like it is firing 3 loadFinished events! Anyhow, the following code works:

// "Normal" JS
function waitForMetadata() {
    // Initialize global meta
    var meta = page.evaluate(function() {
        return document.getElementById("metadata_player")
    }); 

    var txt = meta.innerHTML;
    console.log("meta: '" + meta.outerHTML + "'")
    if (txt != "") {
        phantom.exit(0);
    } else {
        setTimeout(waitForMetadata, 1000);
    }
}


// PhantomJS
var page = require('webpage').create();
page.open('http://player.rockfm.fm/')
page.onLoadFinished = function(status) {
    console.log("Status: " + status);
    if(status !== "success") {
        console.log("FAIL!")
        phantom.exit(1);
    }

    waitForMetadata();
};

The first part is a function that checks the contents of the div and if it is empty it schedules itself, else prints and exits. The second part is straight out of phantomJS tutorial: declares a page, registers an onLoad function and loads it.

Example output:

urban@kde-2:/tmp$ phantomjs  ./test.js 
Status: success
meta: '<div id="metadata_player"></div>'
Status: success
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player">GUNS N' ROSES<br><span id="artist">KNOCKIN' ON HEAVEN'S DOOR</span></div>'

NOTE: Once the content is loaded, with JS you can do whatever you like (instead of printing). Also, I think you want to use the span id=artist later on...

UPDATE 1:

This made me stubborn... I could not make it with with phantomjs however, I inspected the ajax call this page makes and it seems that you can get the currently playing song with:

$ curl 'http://bo.cope.webtv.flumotion.com/api/active?format=json&podId=78'
{"id": null, "uuid": "DFLT", "value": "{\"image\": \"\", \"author\": \"AEROSMITH\", \"title\": \"AMAZING\"}"}

This means you can do use any language you like and json_decodetwice: (1) for the outer map having id, uuid and value and (2) decode the value. My only concern would be if podId changes... but is seems static.

Hope it helps

urban
  • 5,392
  • 3
  • 19
  • 45
  • I tried the same but without the waitForMetadata function and I had this error: "VIDEOJS: ERROR: (CODE: 4 MEDIA_ERR_SRC_NOT_SUPPORTED) Not compatible source was found for this media. [Object Object]" I have tried with your solution, I get the same error but if I can see the empty div but I never get the content, I waited 4 minutes. – rai Nov 10 '17 at 17:07
  • Hi @rai, seems to be something wrong with `phantomjs` version. Trying now with different version, I get jQuery errors (`$` not defined)... – urban Nov 11 '17 at 11:06
  • what about `http://bo.cope.webtv.flumotion.com/api/active?format=json&podId=78`? Should give you what you need no? – urban Nov 14 '17 at 17:49
  • yes, works :) I was looking for yesterday how to get post json code, the only thing I was missing was the url. I was looking at firefox-> inspect code-> network -> response, I did not step through the head to take the url, that fail hahaha Now I will edit and I will put the solution – rai Nov 14 '17 at 18:08