2

I copy the source of a webpage into a text document and I am having trouble getting two data points from the file; the latitude and longitude.

The php file I have to make and scan the document is this:

<?php

$ch = curl_init("http://www.marinetraffic.com/ais/shipdetails.aspx?MMSI=258245000");
$fp = fopen("example_homepage.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);

header('Content-Type: text/plain');

$myFile = "example_homepage.txt";
$fh = fopen($myFile, 'r');
$theData = fread($fh, 9251);
fclose($fh);
echo $theData;

?> 

The gps is buried in text that looks like this (from the file example_homepage.txt):

<img style="border: 1px solid #aaa" src="flags/NO.gif" />
<br/>
<b>Call Sign:</b>LAJW
<br/>
<b>IMO:</b>9386380,
<b>MMSI:</b>258245000
<br/>
<hr/>
<h2>Last Position Received</h2>
<b>Area:</b>North Sea
<br/>
<b>Latitude / Longitude:</b>
<a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'>60.39997˚ / 5.311533˚ (Map)</a>
<br/>
<b>Currently in Port:</b>
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a>
<br/>
<b>Last Known Port:</b>
</b>
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a>
<br/>
<b>Info Received:</b>0d 0h 20min ago
<br/>
<table>
    <tr>
        <td>&nbsp;
            <img src="shipicons/magenta0.png" />
        </td>
        <td>
            <a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'><b>Current Vessel's Track</b></a>
        </td>
    </tr>
    <tr>
        <td>
            <img src="windicons/w05_330.png" />
        </td>
        <td>
            <b>Wind:</b>5 knots, 327&deg;, 13&deg;C</td>
    </tr>
</table>
<a href='datasheet.aspx?datasource=ITINERARIES&MMSI=258245000'><b>Itineraries History</b></a>
<br/>
<hr/>
<h2>Voyage Related Info (Last Received)</h2>
<b>Draught:</b>6.8 m
<br/>
<b>Destination:</b>BERGEN HAVN
<br/>
<b>ETA:</b>2012-05-22 18:00
<br/>
<b>Info Received:</b>2012-05-23 18:43 (

The two numbers I want are:

latitude: 60.39085 longitude: 5.32245

I am not so experienced with this kind of thing. Maybe there is a better way. Please let me know.

EDIT: FYI with the last three lines of code, I am able to get the first 9251 characters in the text file.

Stagleton
  • 1,060
  • 3
  • 11
  • 35
  • possible duplicate of [How to parse and process HTML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php) – derekerdmann May 25 '12 at 19:00

2 Answers2

0

It may be overkill but you can try PHP DOM + parse_url + parse_str:

$text = file_get_contents('http://example.com/path/to/file.html');
$doc = new DOMDocument('1.0');
$doc->loadHTML($text);
foreach($doc->getElementsByTagName('div') AS $div) {
    $class = $div->getAttribute('class');
    if(strpos($class, 'news') !== FALSE) {
        if($div->hasAttribute('src') OR $div->hasAttribute('href')) {
            $parsed_url = parse_url($div->getAttribute('src')));
            $query_values = parse_str($parsed_url);
            $desired_values = array(
                $query_values['centerx'],
                $query__values['centery']
            );
        }
    }
}
Mihai Stancu
  • 15,848
  • 2
  • 33
  • 51
  • hmm, I'm having trouble getting this to work. I have it hosted here: http://thoughtfi.com/search_textdoc.php maybe I didn't execute it correctly? – Stagleton May 25 '12 at 19:13
  • Is the HTML you're fetching well formed? DOM parser can have a hard time accepting badly formed code. – Mihai Stancu May 25 '12 at 19:16
  • Or perhaps you file_get_contents is restricted from accessing http protocol data (I notice the page has been loading since I started writing this comment). – Mihai Stancu May 25 '12 at 19:17
  • hmm, I am not sure. The txt document is here thoughtfi.com/example_homepage.txt. What do you mean the page has started loading? – Stagleton May 25 '12 at 19:31
  • I said the page took a long time to load (and when it did finish loading there was no response from the server). – Mihai Stancu May 25 '12 at 19:33
  • Ive been having trouble getting this to work. I think I figured out a way to do it; Ill post it tomorrow (if it works :-) – Stagleton May 26 '12 at 19:59
0

This is what I did to get the result I wanted: (prints out *-70.19347 42.02112 *)

<?php
//goes though and copies the web page to a text file
$ch = curl_init("http://photos.marinetraffic.com/ais/lightdetails.aspx?light_id=1000019773");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);

//prevents some parsing of the html document
header('Content-Type: text/plain');

//opens text file and reads contents to a string
$myFile = "example_homepage.txt";
$fh = fopen($myFile, 'r');
$theData = fread($fh,12000);
fclose($fh);

//finds the location of the beginning of the GPS data
$pos = strrpos($theData, "&centerx=");
if ($pos === false) { 
    // note: three equal signs
    echo "not found";
}

//cuts out that string and finds position for x and y components
$subtract = 12000-$pos-36;
$rest = substr($theData, $pos, -$subtract);
$lat = substr($rest, 9, -17);
$lonpos = strrpos($rest, "&centery=")+9;
$lon = substr($rest, $lonpos);

//turns the values into floats
$lat = floatval($lat);
$lon = floatval($lon);

//echo $rest;
echo $lat;
echo " ";
echo $lon;

?> 

Hope this helps someone

user229044
  • 232,980
  • 40
  • 330
  • 338
Stagleton
  • 1,060
  • 3
  • 11
  • 35