0

Trying to run apache drill on a single node, following an article for accessing HDFS from embedded drill, but am getting errors

➜  Apps /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=local;schema=dfs"

...

apache drill (dfs)> select * from dfs.`tmp/`;
Error: RESOURCE ERROR: Failed to load schema for "dfs"!

java.net.ConnectException: Call From HW04.ucera.local/172.18.4.49 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused


[Error Id: 2fd541ee-2290-4cf8-979b-aca3c77859e2 ] (state=,code=0)
apache drill (dfs)> !q
Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl

where the dfs storage plugin file looks like...

{
  "type": "file",
  "connection": "hdfs://localhost:8020/",
  "config": null,
  "workspaces": {
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
   ....
}

(note that I don't really know how to determine what port the hdfs connection is supposed to be) and the error message's link (http://wiki.apache.org/hadoop/ConnectionRefused) goes nowhere. Attempting an alternate solution from another SO post throws errors:

➜  Apps /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:drillbit=localhost:31010;schema=dfs"
Error: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:31010 (state=,code=0)
java.sql.SQLNonTransientConnectionException: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:31010
    at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:178)
    at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)

Not sure what to check at this point; Any debugging suggestions or fixes?

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102

1 Answers1

0

Ultimately, what worked was setting the connection hdfs IP to be the IP of the hadoop cluster's namenode (from another SO post on connecting to HDFS in general), so the drill dfs storage plugin config looks like:

{
  "type": "file",
  "connection": "hdfs://localhost:8020/",
  "config": null,
  "workspaces": {
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
   ....
}

and we can do

➜  bin /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=local;schema=dfs"
Apache Drill 1.16.0
"Got Drill?"
apache drill (dfs)> select * from dfs.`tmp/`;
Error: PERMISSION ERROR: Not authorized to read table [tmp/] in schema [dfs.default]


[Error Id: 2e248da5-ba30-43f7-a983-1784d77cf81b ] (state=,code=0)
apache drill (dfs)> 

(note there's now a permissions error I need to fix, but can at least now attempt to query the location).

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102