3

I still don't quite get this "segmentGranularity" in Druid. This page is quite ambiguous: http://druid.io/docs/latest/design/segments.html . It goes on mentioning segmentGranularity but it talks more about intervals (in the first paragraph).

Anyway, at this point the volume of my data is not that big. That page mentioned 300mb-700mb is the "ideal" size of a segment. Actually I can fit a week of data into one segment. That's why I'm thinking of setting segmentGranularity to "week" in my indexing-task json:

  "granularitySpec" : {
    "type" : "uniform",
    "segmentGranularity" : "week",
    "queryGranularity" : "none",
    "intervals" : ["2015-09-12/2015-09-13"]
  },

However, I plan to do the batch indexing every one hour (and this will normally only (re)process data within that same day). So that's why I put only one interval, that spans one day, in the "intervals" field above.

My question: how would that work when the segmentGranularity is set to week (instead of day)? Will it rebuild the cube for the entire segment (of a week)? Which is something I don't want; I want only to rebuild the cube for the day.

Thanks, Raka

Cokorda Raka
  • 4,375
  • 6
  • 36
  • 54

1 Answers1

3

Yes segment granularity period specifies for what duration data should be kept in a particular segment. If your segment is set to weekly than each segment would hold data of a particular week.

Now if you are going to run ingestion task every hour, than the entire segment gets re-build, if you have addition of data only for the day, than its generally better to keep your segment granularity to "day".

But you can very well keep the segment granularity to "week" if your data is small, it shouldn't matter whether druid rebuilds the segments.

Since your data set is small, you can look into tranquility server, which can ingest data on the fly without batch ingestion. It should do fine for your use case.

mdeora
  • 4,152
  • 2
  • 19
  • 29