There are several SO questions about time series databases , but none that address my specific concerns, and although this one comes closest, it's 3 years old.
Requirements:
- Multiple datasets. It doesn't matter how they're organized (separate tables, databases, processes, files, etc.).
- Single host operation (at least initially), so we're limited to approximately 1TB disk and 10GB RAM.
- Read latency/throughput are the key performance metrics.
Data behavior:
- Datasets are append-only, and records are immutable.
- Every record (independent of dataset) needs to be timestamped.
- Records will be 32-bit or 64-bit integers in "simple" datasets, while more "complex" data sets will be vectors of integers between 32-bit and 256-bit each, not exceeding about 1kb per entry.
- There will be one primary "large" table, holding 200M or more entries of a "complex" (see previous point) nature.
- There will be many (10 < N < 100) small(er) datasets (both "simple" and "complex") with perhaps on-the-order-of millions of records each.
Wishlist:
- Starting off on a single host, we really want to avoid complex "Big Data"-y dependencies for the backend (such as HBase), while simpler alternatives would be considered. This takes e.g. OpenTSBD off the table.
- Friendly bindings in a high-ish level language. Ruby, Python, PHP, etc. but we can go down to C, C++, Java, etc. if we can't avoid it.
- Streaming/pubsub/realtime API preferable.
- Custom queries - we'll need more than just simple statistical mean/median/mode/std-dev operations, and it would be great if we can codify our analysis into "native" queries/commands/structures rather than read out all of the data just to calculate everything in the application code.
OpenTSBD is based on HBase, TempoDB won't work on a cost/performance basis, Redis, Mongo, CouchDB, etc all seem like they would choke on this volume of data, and we're left wondering if we're dreaming. Correct me if I'm underestimating any of the mentioned systems (or their contemporaries). Does something like this exist? If not, would we be able to get the job done by yielding on just one of the listed requirements or wishes?