Experimental data organization - would Elastic Search work?

Question

Sorry if this is a bit of an abstract question, I'll try to provide more details.

I run "experiments" (eg test runs of various software), each experiment has its own set of metadata (basically key/value pairs, like start time, end time, name, resource cardinality, system type, etc) and one or more time-series data related to various performance metrics (eg CPU and memory usage from start to end at 10 seconds intervals). The amount of data will not be huge; at most some gigabytes per month.

I'd like to store this data in a single system (eg not metadata in MySQL and performance data in some specialized time series database). Would elasticsearch be a good fit for this? How would I best index the data?

EDIT: to be clearer, here are some thoughts on how to organize the data. For the metadata, use a metadata index, for example like so for experiment aa_12:

{
  "_id": "aa_12",
  "_source": {
    "name": "aa_12",
    "start": 1420070400001,
    "end": 1420097400001,
    "system": "cluster-1",
    "nodes": 6,
    ...
  }
}

Having the experiment name as the _id makes the occasional updates easier (I suppose).

then for the time series associated to this experiment use an index perfdata for example as follows:

{
  "_source": {
    "host": "cluster-1-1",
    "experiment": "aa_12",
    "cpu1": 44,
    "cpu5": 40,
    "cpu15": 41,
    "memtot": 16384,
    "memused": 5025,
    ... rest of metrics
    "time": 1420070410001
  }
}

so I could query, for example, "give me metric X for host Y for the duration of experiment Z" and get metric graphs using kibana/timelion. My concern at this point is that the perfdata index could grow to contain lots of entries (not very big in size overall, but still some hundred thousand/million entries). Does the above make sense?

score 1 · Answer 1 · answered Aug 08 '17 at 16:27

As per my knowledge,

InfluxDB, Cassandra are good choices for time series data
Elasticsearch is good choice for metadata

ELasticsearch is build for searching though many people are able to use it as permanent data store by mitigating the resiliency issues in Elastic using Snapshot and Restore features. Here is link on ElasticSearch resiliency

Further, if your use case is similar to the questions below, then ES is the way to go.

Do you intend to use ES for searching? Yes
Aggregations, Full Text Search? Yes
Do you care about data resiliency? No

If you do care about data resiliency, I would recommend, storing the metadata in another storage (MySQL) as well apart from ES or using snapshot or restore feature of ES to maintain resiliency.

Experimental data organization - would Elastic Search work?

1 Answers1