1

So I was working on a project about transportation system. There are buses reporting to the server. All buses insert new rows into a table called Events every 2 sec, so it is a large table.

Each bus has a unique busID. I want to get a table which contains all buses but only their latest report.

Here are things I tried:

  • Firstly I think I could ORDER BY time DESC LIMIT 20 It turns out that it is sorting the entire table first then doing the LIMIT thing second... which actually make sense, how else could it sort?
  • So I was googling and found out that its much faster to sort with the index. So i did ORDER BY id DESC LIMIT 20; It gave me the latest 20 entries pretty fast.
  • However I don't really need the latest 20 entries instead that I need the latest entry from all buses. So I was thinking about combining GROUP BY bus with ORDER BY id somehow but didn't really figure that out...
  • Next I read about another post on this site about speeding things up when you only need the max value of a column in each group. So finally I came up with SELECT driver,busID,route,timestamp,MAX(id) FROM Events GROUP BY bus However it seems like using MAX(id) does not really help...
  • And I think about first using ORDER BY id LIMIT (some number) to make a sub table, then find the newest entry of each bus within the sub table. But a problem is that, the tablet on the bus which is sending report might accidentally go offline thus unable to insert new rows. So I don't really know how large should I make the sub table so that it contains at least the latest entry of each bus...

So I am kinda running out of ideas... I am still a noob in mySQL, so maybe there are other better functions to use? Or maybe I am complexing things? I though it wouldn't be so hard to do at the begin ...

Any advice would be greatly appreciated.


I also read about this Retrieving the last record in each group which is brilliant! But it still takes forever in my case...


CREATE TABLE `Events` (
  `id` bigint(20) NOT NULL auto_increment,
  `driver` varchar(200) collate utf8_unicode_ci default NULL,
  `bus` varchar(200) collate utf8_unicode_ci default NULL,
  `route` varchar(50) collate utf8_unicode_ci default NULL,
  `time` datetime default NULL,
  `clientTime` datetime default NULL,
  `latitude` decimal(30,20) default NULL,
  `longitude` decimal(30,20) default NULL,
  `accuracy` int(11) default NULL,
  `speed` decimal(30,20) default NULL,
  `heading` decimal(30,20) default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=66528487 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Thank you all for helping me! But it is time to talk to professor! Maybe I am not supposed to do that hmm...

Community
  • 1
  • 1
user3695760
  • 179
  • 1
  • 7
  • have you tried `MAX(timestamp) over (partition by busID order by index desc)` – minatverma Jan 21 '16 at 07:30
  • Thanks for replying I'll try it :) – user3695760 Jan 21 '16 at 07:34
  • Use a Trigger and insert/update a second table that holds only one record per bus (busID Primary) like insert into new_table ...... ON DUPLICATE JEY UPDATE ......; so you have a very small table with all informations you want – Bernd Buffen Jan 21 '16 at 07:35
  • But I don't really have control over the database.. I can only query it. I was thinking maybe we had a table with only the latest report too. But I didn't really find anything like that... ah maybe I missed it, I should look at other tables again I guess... – user3695760 Jan 21 '16 at 07:39
  • https://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html Provides an alternative solution to left join. The speed of any sort heavily depends on the size of your table / preselected result records and that you have an index on the timestamp column – Michel Feldheim Jan 21 '16 at 07:52
  • Thank you for whomever helped me with the format. – user3695760 Jan 21 '16 at 08:08

3 Answers3

2

You have to use indexes: id is a primary key and is already indexed, so sorting by id should be fast, but bus and time are not indexed. I would add a composite unique index like this:

alter table Events add unique index idx_bus_time (bus, time);

this should make the following query much faster:

select bus, max(time)
from Events
group by bus

then you can easily get the last info for each bus:

select e.*
from Events e INNER JOIN (
  select bus, max(time) max_time
  from Events
  group by bus) l on e.bus=l.bus AND e.time=l.max_time

another thing you can do to improve performances is to create a busses table:

create table busses (
  id int primary key auto_increment,
  bus varchar(200)
)

and alter the original table, and use a bus_id INT instead of the bus VARCHAR(200), and index the bus_id and the time column together.

fthiella
  • 48,073
  • 15
  • 90
  • 106
  • Thanks! I will talk to my professor and see if we can get them indexed. – user3695760 Jan 21 '16 at 17:50
  • 1
    What version of MySQL are you using? 5.6 will let you add the index without locking the table. Otherwise, use pt-online-schema-change to add the index with virtually zero downtime. – Rick James Jan 22 '16 at 06:31
0

I would rather make it simple,

I would add one column in the table i.e. < latest_record >...

Now, for the particular < bus_id >'s latest record or Event, it would have value < 0 > in < latest_record > field.

once another entry/event for the same < bus_id > is arrived, before inserting that entry/event, I would update previous_latest entry/event's < latest_record >'s value to '1' and the newly arrived entry will have < latest_record >'s value to '0'

Now, you just have to make Index on < latest_record > and you can find all unique bus_id's latest entry by filtering latest_record='0' in WHERE clause

Hytool
  • 1,358
  • 1
  • 7
  • 22
  • Thanks! Too bad I don't have control over the database ... It is apparently doable though because we have an old api which will just return what I want and it is fast. I don't know what query statement they were using though. I started to feel like maybe there are other useful tables... I will take another look on other tables too – user3695760 Jan 21 '16 at 07:44
  • 1
    can you give result of SHOW CREATE TABLE < your_table > ? – Hytool Jan 21 '16 at 07:46
  • Just added to the op. – user3695760 Jan 21 '16 at 07:54
  • It's not what I'd do. – Strawberry Jan 21 '16 at 08:38
  • @Strawberry, You might have reckoned - adding a column in very large table.... so this solution's possibly not recommended ! May be Adding multi-column index on (busid, timestamp) with tricky query would solve it...What's your solution ? – Hytool Jan 21 '16 at 09:21
  • 1
    fthiella's pretty much nailed it – Strawberry Jan 21 '16 at 09:58
0

The solution would be so simple if you could simply INSERT into a new table that contains one row per bus -- the current status of the bus.

Rick James
  • 135,179
  • 13
  • 127
  • 222