3

I have shark-0.8.0 which runs on hive-0.9.0. I am able to program on Hive by invoking shark. I created a few tables and loaded them with data.

Now, I am trying to access the data from these tables using Scala. I invoked the Scala shell using shark-shell. But when I try to select, I get an error that the table is not present.

scala> val artists = sc.sql2rdd("select artist from default.lastfm")

Hive history file=/tmp/hduser2/hive_job_log_hduser2_201405091617_1513149542.txt
151.738: [GC 317312K->83626K(1005568K), 0.0975990 secs]
151.836: [Full GC 83626K->76005K(1005568K), 0.4523880 secs]
152.313: [GC 80536K->76140K(1005568K), 0.0030990 secs]
152.316: [Full GC 76140K->62214K(1005568K), 0.1716240 secs]
FAILED: Error in semantic analysis: Line 1:19 Table not found 'lastfm'
shark.api.QueryExecutionException: FAILED: Error in semantic analysis: Line 1:19 Table not found 'lastfm'
    at shark.SharkDriver.tableRdd(SharkDriver.scala:149)
    at shark.SharkContext.sql2rdd(SharkContext.scala:100)
    at <init>(<console>:17)
    at <init>(<console>:22)
    at <init>(<console>:24)
    at <init>(<console>:26)
    at <init>(<console>:28)
    at <init>(<console>:30)
    at <init>(<console>:32)
    at .<init>(<console>:36)
    at .<clinit>(<console>)
    at .<init>(<console>:11)
    at .<clinit>(<console>)
    at $export(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:629)
    at org.apache.spark.repl.SparkIMain$Request$$anonfun$10.apply(SparkIMain.scala:890)
    at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
    at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
    at java.lang.Thread.run(Thread.java:744)

From the documentation (https://github.com/amplab/shark/wiki/Shark-User-Guide), these steps are enough to get Shark up and running and select data using Scala. Or am I missing something? Is there some configuration file that needs to be modified to enable access to Shark from shark-shell ?

visakh
  • 2,503
  • 8
  • 29
  • 55

1 Answers1

2

Have you updated your shark-hive directory configuration to properly reflect the hive metastore jdbc connection info?

You will need to copy the hive-default.xml to hive-site.xml . Then ensure the metastore properties are set.

Here is the basic info in hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

You can get more details here: configuring hive metastore

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Thanks for the reply. If possible, could you please point me out to the right file? In the shark hive path conf directory, there's a hive-env.sh and hive-default.xml file. Which one should I be modifying? – visakh May 12 '14 at 07:03
  • I updated my answer. you need to copy the hive-default.xml to hive-site.xml and add the metastore connection parameters. – WestCoastProjects May 12 '14 at 07:08
  • Thanks for adding the details. As of now, I'm using a Derby metastore (which is the default one). Is that a problem? I have a faint memory of reading somewhere that Derby metastore will support only one user at a time. Do you think I need to change the metastore to MySQL to get shark-shell working? – visakh May 12 '14 at 07:32
  • Also, I found that there wasn't a hive-site-xml file in my Hive conf directory and I renamed the template to hive-site.xml. Unfortunately, since then, the Hive commands through shark also stopped working with this error: `Failed to start database 'metastore_db'`. Also, I was wondering if I change from Derby metastore to MySQL one, am I going to lose any metadata and corrupt my tables? – visakh May 12 '14 at 07:36
  • You will likely want to change to mysql or some persistent db. H2 is another possibility. You will lose the existing tables - you have to re-create them. If you are set on sticking with derby I am not sure how that will fly going forward, I can not help further there. – WestCoastProjects May 12 '14 at 07:39
  • Thanks..I installed a MySQL database as the metastore_db and things are fine now.... – visakh May 12 '14 at 08:38