0

I'm investigating how MongoDB would work for us. One of the most used queries is used to get latest (or from a given time) measurements for each station. There is thousands of stations and each station has tens of thousands of measurements.

So we plan to have one collection for stations and another for measurements.

In SQL we would do the query with

SELECT * FROM measurements 
INNER JOIN ( 
  SELECT max(meas_time) station_id 
  FROM measurements 
  WHERE meas_time <= 'time_to_query' 
  GROUP BY station_id 
) t2 ON t2.station_id = measurements.station_id 
    AND t2.meas_time = measurements.meas_time

This returns one measurement for each station, and the measurement is the newest one before the 'time_to_query'.

What query should be used in MongoDB to produce the same result? We are really using Rails and MongoId, but it should not matter.

update: This question is not about how to perform a JOIN in MongoDB. The fact that in SQL getting the right data out of the table requires a join doesn't necessary mean that in MongoDB we would also need a join. There is only one table used in the query.

Mika
  • 1,419
  • 18
  • 37
  • MongoDB doesn’t support JOIN. – styvane Nov 26 '15 at 10:44
  • Possible duplicate of [How do I perform the SQL Join equivalent in MongoDB?](http://stackoverflow.com/questions/2350495/how-do-i-perform-the-sql-join-equivalent-in-mongodb) – chridam Nov 26 '15 at 11:04
  • I don't think this is a duplicate for the Join question. The fact that in SQL this query needs a join doesn't mean that in MongoDB it would need a join. – Mika Nov 26 '15 at 11:15
  • What have you got so far (sample mongo documents for each collection)? Have you tried anything yet (query, aggregation)? (Otherwise, this question falls into the 'gimme the codez now!' category) – joao Nov 26 '15 at 11:24
  • As I said, we are investigating. This query was identified as a potential challenge with MongoDB, so I'm asking what solutions others have found for this problem. The problem it self is quite general. – Mika Nov 26 '15 at 11:28

2 Answers2

0

I guess worst case solution would be something like this (out of my head):

meassures = []
StationId.all.each do |station|
  meassurement = Meassurment.where(station_id: station.id, meas_time <= 'time_to_query').order_by(meas_time: -1).limit(1)
  meassures << [station.name, meassurement.measure, ....]
end

It depends on how much time query can take. Data should anyway be indexed by station_id and meas_time.

How much time does the SQL query take?

Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
TheR
  • 63
  • 1
  • 6
0

We came up with this query

db.measurements.aggregate([{$group:{ _id:{'station_id':"$station_id"}, time:{$max:'$meas_time'}}}]);

with indexes

db.measurements.createIndex({ station_id: 1, meas_time: -1 });

Even though it seems to give the right data it is really slow. Takes roughly a minute to get a bit over 3000 documents from a collection of 65 million.

Just found that MongoDB is not using the index in this query even though we are using the 3.2 version.

Mika
  • 1,419
  • 18
  • 37