Importing data from mysql to druid

Question

I have been using mysql for all my data storage and querying. But as now the tables sizes has become so much high, it takes sometimes hours to get the results. I have taken every possible measures like optimizing queries and indexing the tables properly.

So I have been thinking of using druid. I have not worked with druid earlier. I think I have to import all the tables, data from mysql to druid. I cannot understand where should I start. So if anyone kindly helps me with any kind of guidance, I will be really grateful. Thanks in advance.

Anyone who needs info on performance comparisons between MySQL and Druid, see [the blog at druid.io](http://druid.io/blog/2014/03/12/batch-ingestion.html). — user766353, Mar 18 '14 at 17:39

score 9 · Answer 1 · answered Feb 06 '14 at 05:01

First note that Druid ingests timeseries data, so each row of your data will have to have a timestamp. If that's possible, read on.

Output your data to CSV or TSV. Those are two of the formats supported for batch ingestion. So your data will look something like this:

2013-08-31T01:02:33Z,"someData","true","true","false","false",57,200,-143 2013-08-31T03:32:45Z,"moreData","false","true","true","false",459,129,330 ...

Then you can create an index task which has a firehose section where you specify the location of the file, format, and columns:

"firehose" : {
   "type" : "local",
   "baseDir" : "my/directory/",
   "filter" : "my.csv",
   "parser" : {
     "timestampSpec" : {
       "column" : "timestamp"
     },
     "data" : {
       "type" : "csv",
       "columns" : ["timestamp","data1","data2","data3",...,"datan"],
       "dimensions" : ["data1","data2","data3",...,"datan"]
     }
   }
 }

Note the special handling given to the timestamp column.

Now run the indexing service (the Druid docs contain info on how to start the cluster you'll need) and feed the task to it as described in the section Batch Ingestion Using the Indexing Service. The data will be ingested and processed into Druid segments that you can query.

Is the above spec still correct with latest version of druid? — jagamot, Apr 06 '16 at 12:36
The latest version has docs giving an example of both the csv format and config.json used for ingesting [here](http://druid.io/docs/latest/ingestion/data-formats.html). — Max, Mar 07 '18 at 18:28
no need to export data in CSV. Use SQL input ingestion https://druid.apache.org/docs/latest/ingestion/native-batch.html#sql-input-source — noleto, Jun 01 '21 at 12:46

score 1 · Answer 2 · answered Aug 05 '15 at 03:32

first of all, i dont's think druid work better as a storage, it's more exact when we say that it's a realtime query and process tool. moreover, i prefer to use hive+hdfs combo to replace your mysql since the size of your tables increasing. What's more, druid can take hdfs as its deep storage

Importing data from mysql to druid

2 Answers2