7

I use PHP to proceed latitude/longitude points in order to generate JS and displays points on an OSM Map. I would like to make a new track on the map when I have a pause au 10 minutes or more on the recording.

My dataset as currently about 30000 records on about 10 differents tracks (some tracks have about 300 points, others have thousands).

I encounter a performance issue with PHP. When the loop agregate some hundreds of points, datas are processed with good speed, but if the track has thousands of points, performances drop dramatically.

Here is the time needed to proceed each point for each track

+-----------------+------------------------------+
| Points On Track | Time To Proceed 10000 Points |
+-----------------+------------------------------+
|              21 | 0.75                         |
|           18865 | 14.52                        |
|             539 | 0.79                         |
|             395 | 0.71                         |
|             827 | 0.79                         |
|             400 | 0.74                         |
|             674 | 0.78                         |
|            2060 | 1.01                         |
|            2056 | 0.99                         |
|             477 | 0.73                         |
|             628 | 0.77                         |
|             472 | 0.73                         |
+-----------------+------------------------------+

We can see that when I have a lot of points on a track, performances drop dramaticaly. In this particular case, processing all points require about 30 secondes. If I limit number of points for each track to 500 points, performances are pretty good (about 2,5 seconds to proceed my data set).

I use my Synology DS415play as webserver.

Here is my code :

$dataTab = array();
if ($result = $mysqli->query($sql)) 
    {   
        $count = 0;

        $row = $result->fetch_array();
        $data = $dataTab[$tabPrt] . "[" . $row['latitude'] . "," . $row['longitude'] . "]," ;
        $date = new DateTime($row['time']);         

        while($row = $result->fetch_array())
        {
            $count++;

            $newDate = new DateTime($row['time']);
            if(($newDate->getTimestamp() - $date->getTimestamp()) > 600)
            {
                array_push($dataTab, $data);
                $data= "";
                $count = 0;
            }

            $data = $data . "[" . $row['latitude'] . "," . $row['longitude'] . "]," ;
            $date = $newDate;
        }
        array_push($dataTab, $data);
    }

If I limit each track to 500 points like that, performance is pretty good

$dataTab = array();
    if ($result = $mysqli->query($sql)) 
        {   
            $count = 0;

            $row = $result->fetch_array();
            $data = $dataTab[$tabPrt] . "[" . $row['latitude'] . "," . $row['longitude'] . "]," ;
            $date = new DateTime($row['time']);         

            while($row = $result->fetch_array())
            {
                $count++;

                $newDate = new DateTime($row['time']);
                if(($newDate->getTimestamp() - $date->getTimestamp()) > 600  
                    || $count > 500)
                {
                    array_push($dataTab, $data);
                    $data= "";
                    $count = 0;
                }

                $data = $data . "[" . $row['latitude'] . "," . $row['longitude'] . "]," ;
                $date = $newDate;
            }
            array_push($dataTab, $data);
        }

Thanks

EDIT : I provide a sample of data here : http://109.190.92.126/tracker/gpsData.sql Slow script : http://109.190.92.126/tracker/map.php Normal execution speed by spliting each track (500 pts max) : http://109.190.92.126/tracker/map_split.php

Thanks

  • Can we see your mysql queries as well please? – Paul Stanley Oct 21 '15 at 20:22
  • how much memory did you assign to php? To me this looks like PHP is running short on memory, therefore you are doing "expensive" hard-disk operations when it comes down to huge numbers of entries. – dognose Oct 21 '15 at 20:34
  • Your code can be rewritten so you only query MySQL, as you're doing a simple date-based operation and enclosing everything with `[]` - I'm going to guess you need JSON output, so you can query MySQL and perform `json_encode` on the resulting array/object. However, the "culprit" is 100% MySQL here. It's your query that's slow, not iterating over so many records in an array. – N.B. Oct 21 '15 at 20:36
  • Here is my query : SELECT time, latitude, longitude FROM gpsData WHERE time >= '2015-09-01' AND provider = 'gps'. – remove before flight Oct 21 '15 at 20:37
  • PHP has 128 Mb allocated. – remove before flight Oct 21 '15 at 20:37
  • Whoops, indeed the memory calculation was wrong - forgot the "KB" in between, so failed by a factor of 1024 :P – dognose Oct 21 '15 at 20:51
  • tiny sidenode: using `$array[] = newEntry` rather than `array_push($array, newEntry)` is about 50% faster, see: http://stackoverflow.com/questions/559844/whats-better-to-use-in-php-array-value-or-array-pusharray-value – dognose Oct 21 '15 at 20:58
  • The query fetches 18000 records, if anyone wants to take my query in my answer and finish it, there it is. – Paul Stanley Oct 21 '15 at 21:02
  • Why do you think the performance problem is inside the while loop? It's more likely to be the query itself – symcbean Oct 21 '15 at 21:04
  • If the query outlined above would cause such an performance issue for "only" 18.000 results - then every major website, including stackoverflow would never ever display a result. With only 30.000 of toal records, the query should finish in an unmeasurable amount of time, even if there are no indexes. (Given the mysql server doesn't have some ridiculous configuration) – dognose Oct 21 '15 at 21:06
  • The query itself is procedeed in less than 0.3 s... – remove before flight Oct 21 '15 at 21:38
  • 1
    I can help you checking the code if you send me partial dump of the gpsData table. This would be interesting to hack. – jpaljasma Oct 22 '15 at 11:27
  • 1
    @dognose: Can you provide a link to a page which is composed of the output of 18000 database records? I've never seen such a thing myself. – symcbean Oct 22 '15 at 22:28
  • @symcbean Are you kidding? just ran some queries on a table beeing made of 4 Million records (683.636 results), took 2.7 seconds - on an ordinary desktop-pc. `3 3 14:43:04 SELECT * FROM aTable WHERE TYPE_="VariableUpdate" AND VAR_TYPE_ = "string" LIMIT 0,1000000 683636 row(s) returned 0.000 sec / 2.750 sec` – dognose Oct 23 '15 at 12:45
  • Now try parsing that, embedding it into html, delivering it over http[s] and rendering it on a browser. – symcbean Oct 23 '15 at 14:41
  • @symcbean, I edited my initial post to give an extract of the database and a living demo of problematic script ! Thanks – remove before flight Oct 26 '15 at 21:09
  • @jpaljasma I edited my initial post to give an extract of the database and a living demo of problematic script ! Thanks – remove before flight Oct 26 '15 at 21:10
  • @removebeforeflight I am already checking. You have extra index but that doesn't bother me much. SQL is fast. I am trying to make your life easier by calculating the delta timestamp with SQL and then figure out the concatenation part. Keep you posted. – jpaljasma Oct 27 '15 at 00:06
  • @removebeforeflight all right, I have posted a solution below - go ahead take a look. – jpaljasma Oct 27 '15 at 00:54

2 Answers2

2

If you are getting 18000 records from a database in your worst case scenario, you could move the timestamp check to the query to drop it considerably, it looks like all you are doing is seeing if theres a ten minute gap then pushing to an array which could be done at the mysql level, this way you wouldn't be fetching 18000 rows from the database every time, just the ones you need.

If you post your mysql queries, we can take a look at putting that in there.

Edit: try changing the query to this:

SELECT time, latitude, longitude 
FROM gpsData 
WHERE  time >= '2015-09-01' 
AND provider = 'gps' 
ORDER BY time DESC
Paul Stanley
  • 4,018
  • 6
  • 35
  • 56
  • My query is pretty trivial : SELECT time, latitude, longitude FROM gpsData WHERE time >= '2015-09-01' AND provider = 'gps' – remove before flight Oct 21 '15 at 20:38
  • This query does not meet my requirement because i will get only points recorded since 10 minutes... I would like to get all points since a specified date. I need all points and display them as differents track when I have a 10 minutes gap. – remove before flight Oct 21 '15 at 20:45
1

Here's the final product (its on Heroku so wait until dyno starts up) http://sove.herokuapp.com/gps/

The idea behind the solution is to calculate diff in timestamps on the server side, and manipulate data in the arrays.

The script finishes in 0.278s on my MBP, takes 15.75MB of memory and final output of $ts is an array of routes (I got 7 in total).

There are quite a few optimizations, including skipping same coordinate points. Zoom bounds is not correct on the map but you'll figure it out. I should have really asked a bounty for this job ... If you like the outcome then let me know, I can share the codebase.

Source: https://gist.github.com/jpaljasma/04f54e0d2fa3a632071e

jpaljasma
  • 1,612
  • 16
  • 21