0

I have an app and I'm early enough in the design to walk back the database choice. I'd like to use MongoDB but here is where I'm running into potential issues. I will be doing averaging frequently. Consider this case:

  • A trip leg is a certain number of miles long
  • A trip leg consumes a given amount of fuel
  • The average fuel economy (computed) is a simple division of miles / gallons
  • A more interesting statistic is the average economy of everyone doing the same leg
  • Another interesting statistic is the average economy of everyone doing a leg near specified start and end points

The last point involves a map/reduce across a query obtaining the total number of miles driven and dividing by the total number of gallons consumed. Is this going to make my server melt down?

I'm using Mongoid in a Rails app. Is there any friction I'm injecting here or will it work just fine to streamline common use-cases like insert, delete, update, query?

The other candidate database is Postgres, which also handles location data, but it is not schema less in the way Mongo is.

I recognize that some of this calls for opinion, but perhaps this is information that would benefit SO users.

Thanks!

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Steve Ross
  • 4,134
  • 1
  • 28
  • 40
  • Do you have time to wait for MongoDB to do MapReduce? It's a batch function that should be run in the background. MapReduce is not a realtime function: http://stackoverflow.com/questions/3947889/mongodb-terrible-mapreduce-performance – mttdbrd May 02 '14 at 17:46
  • Background computation would kind of mess up the idea of the app, which is to provide users with a complete answer to their question: "what kind of mileage are people getting on drives between these points?" It seems there's a mismatch between MongoDB and what I'm driving at here unless I'm misunderstanding you @mttdbrd. – Steve Ross May 02 '14 at 18:07
  • Well, I don't think MapReduce is going to work for you. Basically the idea of MapReduce is that you run it say once a day and for the rest of the next day, query the results of the operation. It's not a real-time operation, or it's not designed to be. It may be fast for (very) small data sets, but it's designed for large scale data processing. See here: https://en.wikipedia.org/wiki/MapReduce#Performance_considerations – mttdbrd May 02 '14 at 18:11
  • Actually, re-reading your post, you should be able to use MongoDB. The average fuel-efficiency won't change much from day to day, so you should be able to just generate a new fuel-efficiency MapReduce every day or every couple of days in the background. – mttdbrd May 02 '14 at 18:13
  • How many documents will you have for the MR step? – mttdbrd May 02 '14 at 18:18
  • 1
    would you be able to use the aggregation pipeline with $near, see http://docs.mongodb.org/manual/reference/operator/aggregation/geoNear/ rather than map/reduce? The spatial query has to be the first in the pipeline, but that would seem to fit with your use case. You can't shard on a geospatial field, but then you can't in Postgres/Postgis either -- 2d indexes raise particular problems in this regard. – John Powell May 03 '14 at 07:51

0 Answers0