0

I have a requirement to remove "duplicate" entries from a dataset, which is being displayed on the front-end of our application.

A duplicate is defined by the client as a speed test result which is in the same exchange.

Here is my current query,

SELECT id, isp, exchange_name, exchange_postcode_area, download_kbps, upload_kbps 
FROM speedtest_results
WHERE postcode IS NOT NULL 
AND exchange_name IS NOT NULL 
ORDER BY download_kbps DESC, upload_kbps ASC

This query would return some data like this,

12062   The University of Bristol   Bristol North   BS6     821235  212132
12982   HighSpeed Office Limited    Totton          SO40    672835  298702
18418   University of Birmingham    Victoria        B9      553187  336889
14050   Sohonet Limited             Lee Green       SE13    537686  104439
19981   The JNT Association         Holborn         WC1V    335833  74459
19983   The JNT Association         Holborn         WC1V    333661  84397
5652    University of Southampton   Woolston        SO19    330320  64200

As you can see, there are two tests in the WC1V postcode area, which I'd like to aggregate into a single result, ideally using max rather than avg.

How can I modify my query to ensure that I am selecting the fastest speed test result for the exchange whilst still being able to return a list of all the max speeds?

David Yell
  • 11,756
  • 13
  • 61
  • 100
  • Hey David, Does this help ? http://stackoverflow.com/a/2657495/968442 – nehem Dec 10 '15 at 11:47
  • So what values would you pick? One record has the faster download rate the other the faster upload rate. Which ID and which download and upload rate do you want to see in your results? And which isp and exchange_name in case you have more than one for a postcode area? – Thorsten Kettner Dec 10 '15 at 11:51

1 Answers1

0

Seems that I was far too hasty to create a question! I have since solved my own issue.

SELECT id, isp, exchange_name, exchange_postcode_area, MAX(download_kbps) as download_kbps, upload_kbps 
FROM speedtest_results
WHERE exchange_name IS NOT NULL
AND postcode IS NOT NULL
GROUP BY exchange_name
ORDER BY MAX(download_kbps) DESC
LIMIT 20
David Yell
  • 11,756
  • 13
  • 61
  • 100
  • So in case there is more than one entry for an exchange_name, you get one of its IDs arbitrarily, one of its ISPs, its area which is probably the same for all its records, its maximum download rate and one of its upload rates also arbitrarily chosen from all its records. This is what you were after? – Thorsten Kettner Dec 10 '15 at 11:58
  • We only display Upload, Exchange name and postcode area on the front-end of the website. It's a list of the fastests tests, so I don't mind which it used really. – David Yell Dec 10 '15 at 12:17
  • In general, failing to include within the GROUP BY non-aggregated columns from the SELECT is a recipe for disaster. – Strawberry Sep 19 '19 at 08:36