8

I have been using mysql for all my data storage and querying. But as now the tables sizes has become so much high, it takes sometimes hours to get the results. I have taken every possible measures like optimizing queries and indexing the tables properly.

So I have been thinking of using druid. I have not worked with druid earlier. I think I have to import all the tables, data from mysql to druid. I cannot understand where should I start. So if anyone kindly helps me with any kind of guidance, I will be really grateful. Thanks in advance.

Joy
  • 4,197
  • 14
  • 61
  • 131
  • Anyone who needs info on performance comparisons between MySQL and Druid, see [the blog at druid.io](http://druid.io/blog/2014/03/12/batch-ingestion.html). – user766353 Mar 18 '14 at 17:39

2 Answers2

9

First note that Druid ingests timeseries data, so each row of your data will have to have a timestamp. If that's possible, read on.

Output your data to CSV or TSV. Those are two of the formats supported for batch ingestion. So your data will look something like this:

2013-08-31T01:02:33Z,"someData","true","true","false","false",57,200,-143
2013-08-31T03:32:45Z,"moreData","false","true","true","false",459,129,330
...

Then you can create an index task which has a firehose section where you specify the location of the file, format, and columns:

"firehose" : {
   "type" : "local",
   "baseDir" : "my/directory/",
   "filter" : "my.csv",
   "parser" : {
     "timestampSpec" : {
       "column" : "timestamp"
     },
     "data" : {
       "type" : "csv",
       "columns" : ["timestamp","data1","data2","data3",...,"datan"],
       "dimensions" : ["data1","data2","data3",...,"datan"]
     }
   }
 }

Note the special handling given to the timestamp column.

Now run the indexing service (the Druid docs contain info on how to start the cluster you'll need) and feed the task to it as described in the section Batch Ingestion Using the Indexing Service. The data will be ingested and processed into Druid segments that you can query.

user766353
  • 528
  • 5
  • 23
  • Is the above spec still correct with latest version of druid? – jagamot Apr 06 '16 at 12:36
  • what if I have multiple tables? – Crystal Jun 19 '16 at 21:30
  • 1
    The latest version has docs giving an example of both the csv format and config.json used for ingesting [here](http://druid.io/docs/latest/ingestion/data-formats.html). – Max Mar 07 '18 at 18:28
  • no need to export data in CSV. Use SQL input ingestion https://druid.apache.org/docs/latest/ingestion/native-batch.html#sql-input-source – noleto Jun 01 '21 at 12:46
1

first of all, i dont's think druid work better as a storage, it's more exact when we say that it's a realtime query and process tool. moreover, i prefer to use hive+hdfs combo to replace your mysql since the size of your tables increasing. What's more, druid can take hdfs as its deep storage

shen
  • 21
  • 5