Property (houses) table: 22 million rows - EPC table: 14 million rows
Both tables have duplicate houses with different INSPECTION_DATE's.
The following statement works fine but results include duplicate houses. I only want to return the latest house (max(INSPECTION_DATE)).
SELECT
property.paon,
property.saon,
property.street,
property.postcode,
property.lastSalePrice,
DATE(property.lastTransferDate),
epc.ADDRESS1,
epc.POSTCODE,
epc.TOTAL_FLOOR_AREA,
epc.INSPECTION_DATE,
GLENGTH(LINESTRINGFROMWKB(LINESTRING(GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(longitude, latitude)))),
GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(-2.13609, 53.61405))))))) * 100 AS distance
FROM
property property
INNER JOIN
epc ON property.postcode = epc.POSTCODE
AND CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1
WHERE
property.lastTransferDate >= CURRENT_DATE() - INTERVAL 5 YEAR
HAVING distance < 1.2
ORDER BY property.lastTransferDate DESC
LIMIT 10
I have tried the following statement with the help of a similar question (SQL select only rows with max value on a column) but I'm not entirely sure I've implemented it correctly.
Would appreciate someone to help me with my statement so I could identify and better understand for my use-case
SELECT
property.paon,
property.saon,
property.street,
property.postcode,
property.lastSalePrice,
DATE(property.lastTransferDate),
epc.ADDRESS1,
epc.POSTCODE,
epc.TOTAL_FLOOR_AREA,
GLENGTH(LINESTRINGFROMWKB(LINESTRING(GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(longitude, latitude)))),
GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(-2.13609, 53.61405))))))) * 100 AS distance
FROM
property property
INNER JOIN
epc ON property.postcode = epc.POSTCODE
AND (select max(epc.INSPECTION_DATE) from epc WHERE CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1)
WHERE
property.lastTransferDate >= CURRENT_DATE() - INTERVAL 5 YEAR
HAVING distance < 1.2
ORDER BY property.lastTransferDate DESC
LIMIT 10
Example result illustrating issue: Statement Result Img