TOP [N] Records Group By per user query in the best possible way

Question

I faced an interesting problem today:

Lets suppose the following conditions

1. There are n number of users

2. The system collects GPS coordinates of each driver as they move

3. We have to query last 10 GPS Coordinate records per user sorted by LAST_UPDATE_DATE in descending order

4. There are over 1982008 records in the table

I solved this by querying the last {(10 + [threshold value]) * n} records, and arranged them in java using a HashMap where the user is Key and List of respective coordinates are values

This doesn't seem to be a best possible solution as it may choose random records and may not cover up 10 records per user,

Querying it per user in a for loop is also not a best solution as it would need multiple database calls

Can anybody suggest what would be a best possible solution for this using Spring data JPA, I am also open to use Normal JDBC in case Spring data JPA is unable to solve this

I am using Spring data JPA, Mysql, Java 8 for this

Here is the create table command for this

@RaymondNijland Ok sure, here is the create table command

CREATE TABLE `gps_coordinate` (
    `ID` BIGINT(50) NOT NULL AUTO_INCREMENT,
    `user_id` VARCHAR(255) NULL DEFAULT '0',
    `driver_id` INT(20) NULL DEFAULT '0',
    `latitude` VARCHAR(50) NULL DEFAULT '0.00000000',
    `longitude` VARCHAR(50) NULL DEFAULT '0.00000000',
    `distance_in_miles` VARCHAR(50) NULL DEFAULT '0.00',
    `distance_in_kms` VARCHAR(50) NULL DEFAULT '0.00',
    `device_time` DATETIME NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `created_by` VARCHAR(50) NULL DEFAULT 'Anonymous',
    `created_date` DATETIME NULL DEFAULT CURRENT_TIMESTAMP,
    `updated_by` VARCHAR(50) NULL DEFAULT 'Anonymous',
    `updated_date` DATETIME NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`ID_HUB`),
    INDEX `user_fk` (`user_id`),
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=208445
;

Getting the top N per group in MySQL can be a real pain, [see here](https://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results). — Tim Biegeleisen, Jun 25 '18 at 15:44
@TimBiegeleisen MySQL 8.0 makes it better in lower MySQL versions it is indeed a real pain — Raymond Nijland, Jun 25 '18 at 15:45
@RaymondNijland Ok sure I am adding the create table command in the question — Kalyan Pradhan, Jun 25 '18 at 15:46
@RaymondNijland I had this in mind when I commented, but the thing is, the analytic functions aren't part of JPA anyway, so this still doesn't help the OP necessarily. — Tim Biegeleisen, Jun 25 '18 at 15:46
Well iam pretty sure JPA framework has some way to use a native SQL query on the database like anny other database abstraction layer. @TimBiegeleisen — Raymond Nijland, Jun 25 '18 at 15:48
@RaymondNijland Yes I have used JPA with native mysql queries so if there is any way in native mysql then can you suggest some ideas? — Kalyan Pradhan, Jun 25 '18 at 15:50
@RaymondNijland Yes, it does. But using raw SQL everywhere defeats the purpose of having Hibernate, at least partially. — Tim Biegeleisen, Jun 25 '18 at 15:50
@TimBiegeleisen Although I am using mysql 5.7.18 as this is a legacy database can you suggest some possible solutions with Mysql 8 ? — Kalyan Pradhan, Jun 25 '18 at 15:52
@KalyanPradhan I would suggest looking into writing a stored proc to handle this requirement. You can easily call the proc from your Java code. — Tim Biegeleisen, Jun 25 '18 at 15:53
@TimBiegeleisen can you help out in writing a stored procedure or hint something? — Kalyan Pradhan, Jun 25 '18 at 16:03
Follow the link I gave in the very first comment. This will get you thinking about several methods to do this query in MySQL 5.x. Once you feel comfortable, then worry about how you will call it from your Java code. — Tim Biegeleisen, Jun 25 '18 at 16:04
The problem with the ranking algoritme TimBiegeleisen provided is that it needs to scan the complete table or scan the complete index to rank (should be pretty fast tho with this amount off records but will get slower then the records number goes up)... So it might be smarter to design a table where you store and update the 10 last records per user_id and query that table instead — Raymond Nijland, Jun 25 '18 at 16:06
@RaymondNijland That also seems to be a nice Idea, let me try that as well, However we also need historic records, and sort by date in future, for this purpose we can add an archive table where we can add older records, That's also a nice Idea — Kalyan Pradhan, Jun 25 '18 at 16:15

Davide Lorenzo MARINO · Accepted Answer · 2018-06-25T15:59:38.210

1

The best solution generally is to leave the sorting and extraction to the database. If you need to sort and extract data locally in java you need to extract all the data from the database to your java server. This generate a huge overhead in network traffic, generating poor performances.

There are many databases that handle very well geo spatial data and can query your tables offering sorting by distance or similar. If you need to get only the last 10 positions per user you don't need geo spatial features and any database work well.

For a solution in MySql read this article.

For a geo database you can also use MySql. Here a link to the spatial data types supported by MySql.

edited Jun 25 '18 at 15:59

answered Jun 25 '18 at 15:45

Davide Lorenzo MARINO

26,420
4
39
56

Lorenzo can you suggest some of the GEO spatial database that we can look into? – Kalyan Pradhan Jun 25 '18 at 15:56
Lorenzo Thanks for the Info, I will also try them out – Kalyan Pradhan Jun 25 '18 at 16:17
The first solution with Mysql was really helpful and will also try out the spatial data types support, Thanks – Kalyan Pradhan Jun 25 '18 at 18:18

TOP [N] Records Group By per user query in the best possible way

1 Answers1