0

environment: Server version: 10.7.3-MariaDB-log

tables:

user location history:

CREATE TABLE `location_history` (
  `id` int(10) UNSIGNED NOT NULL,
  `userId` int(10) UNSIGNED DEFAULT NULL,
  `latitude` double(10,8) DEFAULT NULL,
  `longitude` double(11,8) DEFAULT NULL,
  `createdAt` timestamp NOT NULL DEFAULT current_timestamp()
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

polygons(areas) points:

CREATE TABLE `location_area_points` (
  `id` int(10) UNSIGNED NOT NULL,
  `location_area_id` int(10) UNSIGNED DEFAULT NULL,
  `area_group_id` int(10) UNSIGNED DEFAULT NULL,
  `latitude` double(10,8) DEFAULT NULL,
  `longitude` double(11,8) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

what I am trying to achieve: find out how long an user has been inside an area,

for example find out when exactly did the userId 1 last enter into area_group_id 24,

now, what we managed to do so far: find in which area, each point is using the following query:

SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id
FROM
    location_history
JOIN(
    SELECT
        location_area_points.location_area_id,
        ST_PolygonFromText(
            CONCAT(
                "POLYGON((",
                GROUP_CONCAT(
                    CONCAT(
                        location_area_points.latitude,
                        ' ',
                        location_area_points.longitude
                    ) SEPARATOR ', '
                ),
                "))"
            )
        ) AS polygon
    FROM
        location_area_points
    GROUP BY
        location_area_points.location_area_id
) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
    

we get the following as an example with users 1 and 6, :

id  userId  createdAt       location_area_id    
11765   1   2022-07-18 17:03:23 24  
11764   1   2022-07-18 17:03:07 24  
11763   1   2022-07-18 17:02:25 24  
11762   1   2022-07-18 17:02:16 24  
11761   1   2022-07-18 17:01:24 24  
11760   1   2022-07-18 17:00:32 24  
11759   1   2022-07-18 16:59:41 24  
11758   1   2022-07-18 16:59:40 24  <----- include in the results
11757   1   2022-07-18 16:58:49 2   
11756   1   2022-07-18 16:58:04 2   
11755   1   2022-07-18 16:57:06 2   
11754   1   2022-07-18 16:56:23 24  
11752   1   2022-07-18 16:56:14 24  
11753   1   2022-07-18 16:56:14 24  
11751   1   2022-07-18 16:54:31 24  
11750   1   2022-07-18 16:54:30 24  
11749   6   2022-07-18 16:53:39 5   
11748   6   2022-07-18 16:52:47 5   
11747   6   2022-07-18 16:51:56 5   <----- include in the results
11746   6   2022-07-18 16:51:55 24  
11744   6   2022-07-18 16:51:04 24  
11745   1   2022-07-18 16:51:04 24  
11743   1   2022-07-18 16:50:13 24  
11740   1   2022-07-18 16:49:20 24  
11738   1   2022-07-18 16:48:29 24  

now I would like to run additional query on the result above, to find out the first occurrence of the last group please see the code above "include in the results"

so the final result should be :

 id userId  createdAt       location_area_id    
11758   1   2022-07-18 16:59:40 24
11747   6   2022-07-18 16:51:56 5

I apologize if my question is not structured well as I am not sure how to ask such a complicated question and I am open to advice/modification to the question at hand.

vasilevich
  • 717
  • 9
  • 30
  • 1
    Looks like a [window function](https://mariadb.com/kb/en/window-functions-overview/) applied twice. First on the `userId`,`location_area_id` partition getting the last item and then by `userId` getting the first item. – danblack Jul 18 '22 at 23:42
  • 1
    This is not a duplicate of question https://stackoverflow.com/questions/73000024/select-and-join-rows-when-column-value-changes-per-group/73002108#73002108 - but the answer I gave actually applies to this question better. In your case the question can be rephrased "when was the last time the location_area_id changed for this user?" The answer is a little more complicated than maybe it should be because you can't nest window functions in MySQL 8. – Jerry Jul 19 '22 at 02:01
  • @Jerry Thanks for linking me to your answer, I think it might be exactly what I need, I will try to test it out, please post an answer so that I can mark it as answered thanks! – vasilevich Jul 19 '22 at 14:31
  • @Jerry also it seems like the syntax in your answer (for mysql 8) does not work in mariadb 10.7.3, for example I tried your first sample SELECT assessment_id, device_id, test, result, LAG (result) over w as `previous_value`, LAG (assessment_id) over w as `previous_assessment_id` FROM assessments join results using(assessment_id) WINDOW w AS (PARTITION BY device_id, test ORDER BY assessment_id) but it says syntax error can you advice what to do ? – vasilevich Jul 19 '22 at 14:59
  • 1
    Based on https://mariadb.com/kb/en/function-differences-between-mariadb-107-and-mysql-80/#window-functions, the `LAG()` window function should work. – Jerry Jul 19 '22 at 16:24

2 Answers2

1

@danblack commented above that this is a window function problem, but there's a catch: MySQL 8 does not support nested window functions.

What window functions do is take the result of your query, partition it by rules you state, then allow you to add extra columns to the result with information based on that grouping.

You provided the table definitions - yay! You did not provide insert statements for the test data - aww. To test my answer I will be creating a table that matches your result set above, then selecting all to perform the window functions.

Window functions allow you to add extra columns in the result, that include information about neighboring rows.

SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id,
    LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
... (the rest of your query)

This will produce a result set just like the one above, but with a column for where that user was in the previous record.

Window functions apply to the result of a query after the query is essentially complete, so there's not a way to narrow the results except to use them as a subquery. To get only the rows where the location changed:

SELECT * from (
    // insert big query here
) as `transitions`
WHERE where location_area_id != previous_area OR previous_area IS NULL

You may not care about the case where previous_area is null, it is just the first record for each user. But if the user hasn't changed locations since, then that record may be relevant to you.

Now we have a list of every time the location changed.

+-------+--------+---------------------+------------------+---------------+
| id    | userId | createdAt           | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 |      1 | 2022-07-18 16:59:40 |               24 |             2 |
| 11755 |      1 | 2022-07-18 16:57:06 |                2 |            24 |
| 11747 |      6 | 2022-07-18 16:51:56 |                5 |            24 |
| 11744 |      6 | 2022-07-18 16:51:04 |               24 |          NULL |
| 11738 |      1 | 2022-07-18 16:48:29 |               24 |          NULL |
+-------+--------+---------------------+------------------+---------------+

The remaining challenge is to find the LATEST row for each user. It seems like a good time for another window function, but that's not supported in MySQL 8. We can modify the above to get that result:

SELECT MAX(id) as last_transition from (
    // insert big query here
) as t
WHERE where location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId

which yields

+-----------------+
| last_transition |
+-----------------+
|           11758 |
|           11747 |
+-----------------+

A quick check shows that this agrees with the records you indicate in the question.

So now we can take the big query, and join it with this result (that also uses the big query), and have the complete answer:

WITH big_query AS (
SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id
    LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
    location_history
JOIN(
    SELECT
        location_area_points.location_area_id,
        ST_PolygonFromText(
            CONCAT(
                "POLYGON((",
                GROUP_CONCAT(
                    CONCAT(
                        location_area_points.latitude,
                        ' ',
                        location_area_points.longitude
                    ) SEPARATOR ', '
                ),
                "))"
            )
        ) AS polygon
    FROM
        location_area_points
    GROUP BY
        location_area_points.location_area_id
) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
)

SELECT * from big_query
  JOIN (
    SELECT MAX(id) as id FROM big_query
    WHERE location_area_id != previous_area OR previous_area IS NULL 
    GROUP BY userId
  ) as last_transitions using(id)

with the final answer

+-------+--------+---------------------+------------------+---------------+
| id    | userId | createdAt           | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 |      1 | 2022-07-18 16:59:40 |               24 |             2 |
| 11747 |      6 | 2022-07-18 16:51:56 |                5 |            24 |
+-------+--------+---------------------+------------------+---------------+

The WITH statement lets you take a SELECT result and treat it like a table in its own right for the duration of the query. You might think of a better name than "big_query" :).

Jerry
  • 3,391
  • 1
  • 19
  • 28
  • wow, I am speechless, you are a genius. amazing explanation and i learnt alot thanks so much! – vasilevich Jul 19 '22 at 17:04
  • by the way, couldn't you just do SELECT *, MAX(id) as last_transition .... instead of going with the "with" and all the additional queries? it included the rest of the columns just as well – vasilevich Jul 19 '22 at 17:08
  • Please see here what I mean : https://stackoverflow.com/a/73041054/2294803 – vasilevich Jul 19 '22 at 17:14
  • 1
    When you have a GROUP BY clause, the value it chooses out of the group for the other columns is not defined. Newer versions of MySQL would not even allow that syntax unless you altered settings. Unless mariaDB is intentionally different in this case. – Jerry Jul 19 '22 at 20:36
  • 1
    In any recent MySQL, your version would generate an error like: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'big_query.createdAt' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by – Jerry Jul 19 '22 at 20:46
0

Thanks to @Jerry https://stackoverflow.com/a/73040422/2294803 with his amazing answer. I used his example and the final query I came up with which works on mariadb 10.7.3

for completness:

SELECT
    *,
    MAX(id) AS last_transition
FROM
    (
    SELECT
        *
    FROM
        (
        SELECT
            location_history.id,
            location_history.userId,
            location_history.createdAt,
            s.location_area_id,
            LAG(location_area_id) OVER(
            PARTITION BY userId
        ORDER BY
            createdAt
        ) AS previous_area
    FROM
        location_history
    JOIN(
        SELECT
            location_area_points.location_area_id,
            ST_PolygonFromText(
                CONCAT(
                    "POLYGON((",
                    GROUP_CONCAT(
                        CONCAT(
                            location_area_points.latitude,
                            ' ',
                            location_area_points.longitude
                        ) SEPARATOR ', '
                    ),
                    "))"
                )
            ) AS POLYGON
        FROM
            location_area_points
        GROUP BY
            location_area_points.location_area_id
    ) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
    ) AS `transitions`
WHERE
    location_area_id != previous_area OR previous_area IS NULL
) AS t
WHERE
    location_area_id != previous_area OR previous_area IS NULL
GROUP BY
    userId

gives me the result:

id  userId  createdAt   location_area_id    previous_area   last_transition 
11758   1   2022-07-18 16:59:40 24  2   11758   
5121    3   2022-07-18 00:05:30 2   10  5121    
2364    4   2022-05-03 22:59:48 11  2   2364    
12978   5   2022-07-19 17:12:41 2   10  12978   
1747    12  2022-05-03 12:23:35 2   NULL    1747    
1703    14  2022-05-03 02:49:57 24  NULL    1703    
1734    17  2022-05-03 11:08:43 24  NULL    1734    
2623    24  2022-05-29 11:55:59 2   NULL    2623    
2610    25  2022-05-17 07:39:02 2   NULL    2610    
2620    29  2022-05-29 11:48:04 13  2   2620    
2629    35  2022-05-29 13:40:45 2   NULL    2629    
3215    36  2022-07-17 22:48:32 24  25  3215    
11777   41  2022-07-19 09:47:30 24  NULL    11777   
3252    42  2022-07-17 22:50:09 24  NULL    3252    

which seem exactly what I have been looking for!

Thanks to @Jerry

vasilevich
  • 717
  • 9
  • 30
  • 1
    Please refer to this documentation to evaluate whether eliminating the JOIN is valid in your case: https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html. In another SO answer, I just saw this behavior referred to as "notorious" - https://stackoverflow.com/a/73050564/491754. You might want to avoid it. – Jerry Jul 20 '22 at 15:56