How to store 800 billion GPS markers in database

Question

I need to store GPS tracks that users record into a database. The tracks will consist of a marker every 5 meter of movement for the purpose of drawing a line on a map. I am estimating 200 km tracks which means 40,000 lnlt markers. I estimate 50,000 users minimum and 20 pieces of 200 km tracks for each. That means at least 40 billion lnlt markers.

This needs to scale too, so for 1 million users I need capacity for 800 billion GPS markers.

Since each set of 40,000 markers belong to a single track, we are talking 1 - 20 million records/sets of GPS tracks.

Requirements: Users will request to view these tracks on top of a Google map in a mobile application.

Relations: I currently have 2 tables. Table one has:[trackid], [userid], [comment], [distance], [time], [top speed].

Table 2 has [trackid] [longitude] [latitude] and this is where all GPS markers are stored. What is an efficient way of storing this volume of GPS data while maintaining read performance?

New information:

Storing the GPS data in a KML file for the purpose of displaying them as a track on top of a Google map is a good solution that saves database space. Compressing the KML into a KMZ (basically a zipped KML wit a KMZ extension) greatly reduces file size further. KMZ loads much quicker than GPX and can be integrated with the Google Maps API as a KML layer. See this information from Google for further assistance. This seems to be the best solution so far for the intended requirement.

You can even store it in a flat files, one marker has fixed length, so you can seek the file easily. — Andrey, Oct 10 '12 at 18:21
I'm not sure that product recommendations are on-topic or constructive per the guidelines of the site... But it sounds like this is highly normalized data. Any good RDBMS should work. — David, Oct 10 '12 at 18:21
@DavidStratton I think that it is great overhead to store this sort of data in RDBMS. It is basically binary data. — Andrey, Oct 10 '12 at 18:23
If RDBMS I would have one table with: [trackid], [userid], [comment], [distance], [time], [top speed]. And another table with [trackid] [longitude] [latitude]. I am just worried that such a gigantic table would slow things down. Users will request these tracks at any given point which requires them to be fetched from the big table. — Karl, Oct 10 '12 at 18:45
@Karl I would defiantly store this data in Table One with the other data as a binary column. If you are not interested in querying the positions in the database there is not reason the store them in a separate table. Also the space needed to store that data would be quite big. To Bad the question got closed, it is an interesting problem. — Magnus, Oct 11 '12 at 07:38
@karl could you get away with a [trackID][lat][Long]then have a second table with a [trackID][bearingID][Bearing][time][speed] then you need only record when the bearing has changed significantly to require a new bearing record. You would need to test your latitude and longitude values to get the exact bearing, but you could easily write a function to consider only bearing changes of a certain magnitude. You could even allow a user function to alter the level of granularity of the bearing calculation based on mode of travel (eg. if traveling by car you will probably follow straighter lines!) — DaveM, Feb 22 '13 at 12:00
@DaveM I decided to go with MongoDB. One collection holds record information such as [trackid][userid][distance][time][speed][comment]. The trackID refers to a CSV file with longitude/latitude/timestamp. To keep the filesize down I record a new GPS point with constraints: min distance 5 meter, max distance 200 meter, location frequency check set to smallest. If a user is cruising down the freeway it won't waste space. Likewise, detail is preserved if driving slowly in traffic. This was googles solution from a different app they made. — Karl, Apr 03 '13 at 11:34
@karl "This was googles solution from a different app they made" +1 to open source. Glad he idea helped you find a suitable answer to your problem. — DaveM, Apr 03 '13 at 14:36

jrd1 · Answer 1 · 2012-10-11T01:12:29.690

7

The choice of a particular database, as always, is tied to how you want to store the information and how you want to use it. As such, without knowing the exact requirements of your project, as well as the relationships of the data, the best thing to do would be to do some reading on the topic to determine what particular product or storage model is best suited to you.

A good place to start is reading blogs that compare the performance and uses of the databases (see attached):

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

edited Oct 11 '12 at 01:12

answered Oct 10 '12 at 18:24

jrd1

10,358
4
34
51

1

Well noted. I changed the phrasing of my question and added the requirements for this issue and the relations between current tables. Maybe it can be reopened? – Karl Oct 10 '12 at 19:01
2

More information for anyone interested in managing large amounts of GPS marker data can read here https://developers.google.com/maps/articles/toomanymarkers about Google solutions such as fusion tables. – Karl Oct 10 '12 at 19:14
Good sportsmanship. Honestly, I think your question is still too broad, as you are trying to cover many different areas at once. Personally, I do not have enough privilege to reopen your question (it requires 3000 reputation). For more on reopening questions, see [this](http://meta.stackexchange.com/questions/36415/how-do-you-reopen-a-closed-question). – jrd1 Oct 10 '12 at 19:31
3

No problem. I will just keep adding new information as I research this topic further since others might be interested in solutions to similar problems. – Karl Oct 11 '12 at 05:57
Good stuff @Karl, I think with all the edits and comments you've made, your question stands a good chance that it may be reopened. – jrd1 Oct 11 '12 at 19:25
I have a similar problem (design). @Karl have you found any solutions or further suggestions about this. – Skeletor Mar 20 '14 at 09:17
1

@Skeletor my solution was to save list of related markers in a kml file and give it a unique Id that is referenced in the database. That way the database can be cut in size by 4000 times and only contains references instead of static datapoints. – Karl Mar 21 '14 at 14:04
yes that is a smart way. thank you @Karl i will keep that in mind. – Skeletor Mar 21 '14 at 14:51

How to store 800 billion GPS markers in database

1 Answers1

Linked