Apache Drill configuration

Question

I need to add storage plugins for Apache Drill (basically PSVs) but I am unable to find the configuration file where I could add following lines:-

 "formats": {
   "psv": {
     "type": "text",
     "extensions": [
       "tbl"
     ],
     "delimiter": "|"
   }
}

Note that the current solutions to open a the local host url in a web browser is not feasible. I don't want to expose the port and IP to Internet. Currently I do double hop ssh to reach my server which hosting drill

score 1 · Accepted Answer · answered Feb 25 '15 at 21:18

1

You can post to Drill's REST API:

curl -X POST -H “Content-Type: application/json” -d ‘{ “name”:dfs, “config” {“type”: “file”, "connection": "hdfs:///", “enabled”: true, "workspaces": {"root": {"location": "/", "writable": false, "defaultInputFormat": null}}, "formats": { "psv": { "type": "text", "extensions": [ "tbl" ], "delimiter": "|" }}}’ http://localhost:8047/storage/dfs.json

You can also create a bootstrap-storage-plugins.json file and include it on the classpath when starting Drill and it should be loaded when Drill boots up.

answered Feb 25 '15 at 21:18

Chris Matta

3,263
3
35
48

Thanks for the help. However I have decided not to use Apache Drill. I was mistaking it as replacement of Hive. – Mangat Rai Modi Feb 26 '15 at 06:55
Apache Drill can certainly be considered a replacement for Hive. What is Drill not doing that Hive is? Just curious. – Chris Matta Feb 27 '15 at 04:13
I am quite novice in Big Data stack but what I understood is that Drill is a replacement of Infobright. In Hive you can load csv at runtime, execute map reduce SQL queries. Overall SQL queries will take more time as Hive doesn't do any indexing, processing on the table we have imported csv into. In drill I understood that you create a data warehouse by specifying csv's at configuration. Drill will then create its datastore, replicate data in its tables, do processing. I believe drill will consume more disk space but will execute queries really fast. [1/2] – Mangat Rai Modi Feb 27 '15 at 07:43
So Hive is best suited for taking Adhoc Batch queries but drill is for real time queries...! Please correct me, There is good chance that I am wrong. [2/2] – Mangat Rai Modi Feb 27 '15 at 07:43
Hive essentially leverages hadoop and launches mapreduce jobs and is very useful for batch queries. Drill on the other hand is based on Google's Dremel which is more useful for interactive adhoc queries. (Link: http://stackoverflow.com/questions/6607552/what-is-googles-dremel-how-is-it-different-from-mapreduce) – Yash Sharma Feb 27 '15 at 13:36
Drill does not replicate any data into it's tables, it address the data in-place in the HDFS filesystem. Drill's configuration allows you to point at a directory/file and query it as it lies, no ETL or schema is required. Drill will not take up extra disk space. – Chris Matta Feb 27 '15 at 14:22
@ChrisMatta can you plz have a look on http://stackoverflow.com/questions/31962882/exception-in-using-bootstrap-storage-plugins-json-file-for-storage-plugin-in-apa – Dev Aug 12 '15 at 10:44

score 0 · Answer 2 · answered Jun 29 '15 at 21:03

0

Also you can use Drill UI. Once Drill is started the Drill UI is available on port 8047 (default). Once in UI click on Storage and you can see all the Enabled and Disabled storage plugins and you can add/create additional storage plugins from here.

answered Jun 29 '15 at 21:03

Gopi Kolla

964
6
12

Apache Drill configuration

2 Answers2