3

We collect data from 10 sources every second. I'm trying to think of a way to store the data in memory that would allow me to get data like "Every point of data from 12:01 to 12:02".

I've thought about using a tree of some sort or a sorted list. The key would be the time collected and the value would be an array of the measurements. I can't think of how to say "Give me all values for keys in this range". I can only think to use the sorted nature of these structures to quickly get the values for a certain key.

Will I need to calculate the appropriate keys (12:01:00, 12:01:01, ..., 12:01:59) and pull each of those values independently or is there some way I can utilize a sorted data structure to get all my data at once?

Tyler DeWitt
  • 23,366
  • 38
  • 119
  • 196
  • [Redis](http://redis.io), [VoltDB](http://voltdb.com/), [Daybreak](http://propublica.github.io/daybreak/)... – Mark Thomas Mar 12 '14 at 18:50
  • I thought Redis was only a key/value store. Is there some extra functionality to operate on the keys? – Tyler DeWitt Mar 12 '14 at 19:18
  • I don't get it. Isn't what you want is just a range query, right ? Any sorted data structure can give you that, if you store data in a sorted array why you can's just query give me all the element between 12:01--12:05 ? But since you clearly need insert a lot, I would suggest you to use B+ tree. – Leo Mar 12 '14 at 20:02
  • I'm probably missing something obvious, but I can't think of how to do a range query on a sorted data structure. Given a list of keys, I could easily grab the appropriate values one at a time, but I can't think of how to do that in one fell swoop – Tyler DeWitt Mar 12 '14 at 20:39
  • 1
    Yes, Redis is more of a data structure server. Even the keys can be data structures. It supports hashes, lists, sets, etc. In your case you probably want a sorted set. You can do a ZRANGE query to get values for keys in a certain range. – Mark Thomas Mar 12 '14 at 21:03
  • WTF ignore my answer. I thought this was a Python question. – Niklas B. Mar 12 '14 at 22:28

1 Answers1

0

There isn't a way to do this with a ruby Hash, and there isn't an easy way to implement it yourself. You would have to implement a binary tree/B-tree/redblack tree along with a nice API for doing range queries.

My suggestion would be to use an in-memory sqlite database. Then you can query it with familiar sql grammar. Sqlite is a very well done project and I imagine will be very space and time efficient.

require 'sqlite3'
@db = SQLite3::Database.new ":memory:"
@db.execute <<-SQL
  create table records(
    id integer primary key autoincrement,
    timestamp integer,
    data text
  );
SQL
@db.execute <<-SQL
  select * from records where ...
SQL

Read about sqlite data types here: http://www.sqlite.org/datatype3.html

John Bachir
  • 22,495
  • 29
  • 154
  • 227