I have a requirement to remove "duplicate" entries from a dataset, which is being displayed on the front-end of our application.
A duplicate is defined by the client as a speed test result which is in the same exchange.
Here is my current query,
SELECT id, isp, exchange_name, exchange_postcode_area, download_kbps, upload_kbps
FROM speedtest_results
WHERE postcode IS NOT NULL
AND exchange_name IS NOT NULL
ORDER BY download_kbps DESC, upload_kbps ASC
This query would return some data like this,
12062 The University of Bristol Bristol North BS6 821235 212132
12982 HighSpeed Office Limited Totton SO40 672835 298702
18418 University of Birmingham Victoria B9 553187 336889
14050 Sohonet Limited Lee Green SE13 537686 104439
19981 The JNT Association Holborn WC1V 335833 74459
19983 The JNT Association Holborn WC1V 333661 84397
5652 University of Southampton Woolston SO19 330320 64200
As you can see, there are two tests in the WC1V postcode area, which I'd like to aggregate into a single result, ideally using max
rather than avg
.
How can I modify my query to ensure that I am selecting the fastest speed test result for the exchange whilst still being able to return a list of all the max speeds?