0

Property (houses) table: 22 million rows - EPC table: 14 million rows

Both tables have duplicate houses with different INSPECTION_DATE's.

The following statement works fine but results include duplicate houses. I only want to return the latest house (max(INSPECTION_DATE)).

SELECT 
    property.paon,
    property.saon,
    property.street,
    property.postcode,
    property.lastSalePrice,
    DATE(property.lastTransferDate),
    epc.ADDRESS1,
    epc.POSTCODE,
    epc.TOTAL_FLOOR_AREA,
    epc.INSPECTION_DATE,
    GLENGTH(LINESTRINGFROMWKB(LINESTRING(GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(longitude, latitude)))),
                            GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(-2.13609, 53.61405))))))) * 100 AS distance
FROM
    property property
        INNER JOIN
    epc ON property.postcode = epc.POSTCODE
        AND CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1 
WHERE
    property.lastTransferDate >= CURRENT_DATE() - INTERVAL 5 YEAR

HAVING distance < 1.2
ORDER BY property.lastTransferDate DESC
LIMIT 10

I have tried the following statement with the help of a similar question (SQL select only rows with max value on a column) but I'm not entirely sure I've implemented it correctly.

Would appreciate someone to help me with my statement so I could identify and better understand for my use-case

    SELECT 
    property.paon,
    property.saon,
    property.street,
    property.postcode,
    property.lastSalePrice,
    DATE(property.lastTransferDate),
    epc.ADDRESS1,
    epc.POSTCODE,
    epc.TOTAL_FLOOR_AREA,
    GLENGTH(LINESTRINGFROMWKB(LINESTRING(GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(longitude, latitude)))),
                            GEOMFROMTEXT(ASTEXT(POINTFROMWKB(POINT(-2.13609, 53.61405))))))) * 100 AS distance
FROM
    property property
        INNER JOIN 
    epc ON property.postcode = epc.POSTCODE
        AND (select max(epc.INSPECTION_DATE) from epc WHERE CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1)
WHERE
    property.lastTransferDate >= CURRENT_DATE() - INTERVAL 5 YEAR 

HAVING distance < 1.2
ORDER BY property.lastTransferDate DESC
LIMIT 10

Example result illustrating issue: Statement Result Img

Mike
  • 61
  • 7
  • update your question add a proper data sample and the expected result – ScaisEdge Mar 28 '18 at 19:08
  • Maybe this post would help. https://stackoverflow.com/questions/18221999/group-by-having-max-date – jeffkempf Mar 28 '18 at 19:16
  • whiout a proper data sample is not possible procede valid code .. – ScaisEdge Mar 28 '18 at 19:31
  • You can put a "First-cut bounding box" around your circle. that reduce the rows to filter and the query can use a index on long, lat and date . see: https://www.movable-type.co.uk/scripts/latlong-db.html – Bernd Buffen Mar 28 '18 at 19:35
  • @scaisEdge I've added "Example result illustrating issue" at the bottom of my post. Thanks – Mike Mar 28 '18 at 19:36
  • @scaisEdge fyi... the query worked (with distance < 50000)... but it took 284 seconds to execute. Do you know any way to improve efficiency? – Mike Mar 30 '18 at 13:38
  • 50000 ?? what are meters ? have you tried with 10000 .. or less ? – ScaisEdge Mar 30 '18 at 13:52
  • but which index you have on the property and epc tables ?? – ScaisEdge Mar 30 '18 at 14:00
  • Yes I tried with < 1.2 at first (miles) but to try speed up query I put 50000 just to see if it worked. Now I know it works, it just needs more efficiency – Mike Mar 30 '18 at 15:06

0 Answers0