1

Possible Duplicate:
How to prevent SQL injection in PHP?
Select distinct rows from MySQL Database

Hi I have a phonegap application that stores latitude, longitude, address and severity when a road surface deformation (such as pothole and speedbump) is detected.

So far saving these in the database through a PHP PDO is not a problem at all. The PDO is designed in a way that if the pothole being reported has already been reported 10 times (checks the database for any entries within a 15 meter range), then it would not be reported (i.e inserted in the database again). Also, loading the surface deformations is not a problem either, i am using the Haversine formula to do that, where I pass the latitude and longitude of the user and get the values within a certain distance.

$stmt = $dbh->prepare("
  SELECT
    lat, lng,
    ( 6378160 * acos( cos( radians(?) ) * cos( radians( lat ) ) * cos( radians( lng ) - radians(?) ) + sin( radians(?) ) * sin( radians( lat ) ) ) ) AS distance
  FROM myTable
  HAVING distance > 0 
  ORDER BY distance
  LIMIT 0 , 30
");

The issue I have is that since the same pothole can be reported 10 times, I am ending up having the same pothole reported back to the application for charting on a map 10 times. What I need to do is, get the list of potholes that are within a certain distance from the user (done using the haversine formula), and then out of this list, filter the potholes so that I only get distinct potholes rather than the same pothole being returned back 10 times. Anyone has any idea how I can make such filtering? Can anyone tell me how is it posssible to do this in PHP/PDO or point me to some similar tutorial if available?

Here is what I need to do in brief: say I am near pothole A and Pothole B, and say I have 6 reports for pothole A, and 8 reports for pothole B (and so on) in the database. By using the haversine formula I get all the values of the reports for pothole A and pothole B (ie 14 results). What I need is rather I get the midpoint of the reports for pothole A and midpoint of the reports for Pothole B (using: http://www.geomidpoint.com/calculation.html) and return back 2 results (one for A and one for B) rather than 14 results.

Community
  • 1
  • 1
user1809790
  • 1,349
  • 5
  • 24
  • 53
  • 3
    How is this different from your previous question? http://stackoverflow.com/questions/14120373/select-distinct-rows-from-mysql-database. If you are not getting any helpful answers on your other question, then consider editing it instead of posting another version. – Taryn Jan 02 '13 at 11:41
  • 1
    @bluefeet the other question was marked for deletion as the question is now how to do this in PHP/PDO rather than asking for SQL only – user1809790 Jan 02 '13 at 11:50
  • Do you require to hold a record of each report? – david strachan Jan 02 '13 at 16:50
  • @davidstrachan the idea is so that I get the mid-point of the 10 results in order to get a more accurate point for the pothole – user1809790 Jan 02 '13 at 17:00
  • As the question is closed I cannot answer it.You will need another int field **cnt** in your database to hold number of instances. When you enter a new _siteing_ of pothole use the [Haversine](http://stackoverflow.com/questions/13997385/selecting-distinct-longitude-and-latitude-values-and-finding-their-median) formula to find if it is within the limits of an existing one. If it is recalculate the new centre,increment **cnt** and UPDATE the record.If the pothole is new INSERT new record with **cnt** =1. Only display markers when **cnt** >10 – david strachan Jan 02 '13 at 17:39
  • @user1809790 I have produced a [DEMO](http://daviestrachan.co.uk/maps/save.php) using the ideas in above comment. I will update my answer in earlier question http://stackoverflow.com/questions/13997385/selecting-distinct-longitude-and-latitude-values-and-finding-their-median shortly – david strachan Jan 04 '13 at 12:41

2 Answers2

1

You have to separate potholes and reports. First query what potholes are around GROUP BY lat, lng and then query one report for each pothole with LIMIT 1.

$rows = dbh_query("SELECT id FROM ... ");
foreach ($rows as $row)
{
    dbh_query("SELECT ... WHERE id = :id LIMIT 1", array('id' => $row['id']));
}
PiTheNumber
  • 22,828
  • 17
  • 107
  • 180
  • can you tell me what the last bit of your answer means please? WHERE id IN (5,90,8) – user1809790 Jan 02 '13 at 11:51
  • I made an update. Hope that helps. – PiTheNumber Jan 02 '13 at 12:03
  • so if I have understood it well, first I get a list of ids of all the potholes that are say 10 meters away from me. Then using the in, I select the potholes that are say 5 meters away from the ids returned in the previous statement? Wouldnt that give the same result or not? – user1809790 Jan 02 '13 at 12:24
  • Yes, that is absolutly right. That made no sence at all :) I corrected this, but you might want to use something like this: http://stackoverflow.com/questions/2917558/mysql-ignore-a-selected-column-when-using-distinct – PiTheNumber Jan 02 '13 at 12:33
0

I think you have 2 options to elaborate. Unfortunately both implies some complications.

1) Instead of writing multiple observations, you should always clusterize them at writing time. For example, if you support space granularity of 10 meters, then every time a new measurement arrives that is less then in 10 meters from existing record, you do not add a new pothole, but change average values (latitude, longitude, counter) in the nearest existing record. This way you'll end up with 2 records for potholes A and B for your example, so you can use DISTINCT query.

2) For every request, you can fetch all records in 15 meter range from existing table and create a temporary table on them for calculating a probability density function along the road axe. Again, this requires to choose a granularity which can be simulated as decimal accuracy in the ROUND function. For example, if you would have a stored function for calculating distance between current point and existing records, you could write:

INSERT INTO `temppdf` (dist, pothole_id)
SELECT FROM `maintable`
ROUND(distance(@current_lat, @current_lon, maintable.lat, maintable.lon), -1), pothole_id
WHERE distance(@current_lat, @current_lon, maintable.lat, maintable.lon) < 15;

Then you can query temppdf for rows with maximum counters for every pothole, something like this:

SELECT pothole, MAX(cnt) as `peak` FROM
(SELECT DISTINCT pothole, COUNT(dist) as cnt FROM `temppdf` GROUP BY pothole, dist) as `subq`
GROUP BY pothole;

The potholes with counters larger than a threshold are the result.

Stan
  • 8,683
  • 9
  • 58
  • 102