3

I installed this spark version: spark-1.6.1-bin-hadoop2.6.tgz.

Now when I start spark with ./spark-shell command Im getting this issues (it shows a lot of error lines so I just put some that seems important)

     Cleanup action completed
        16/03/27 00:19:35 ERROR Schema: Failed initialising database.
        Failed to create database 'metastore_db', see the next exception for details.
        org.datanucleus.exceptions.NucleusDataStoreException: Failed to create database 'metastore_db', see the next exception for details.
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516)

        Caused by: java.sql.SQLException: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
            org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
            ... 128 more
        Caused by: ERROR XBM0H: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.


        Nested Throwables StackTrace:
        java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
  org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
            ... 128 more
        Caused by: ERROR XBM0H: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
            at org.apache.derby.iapi.error.StandardException.newException


        Caused by: java.sql.SQLException: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
            at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
            at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
            at 
            ... 128 more

        <console>:16: error: not found: value sqlContext
                 import sqlContext.implicits._
                        ^
        <console>:16: error: not found: value sqlContext
                 import sqlContext.sql
                        ^

        scala> 

I tried some configurations to fix this issue that I search in other questions about the value sqlContext not found issue, like:

/etc/hosts file:

127.0.0.1  hadoophost localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    10.2.0.15 hadoophost

echo $HOSTNAME returns:

hadoophost

.bashrc file contains:

export SPARK_LOCAL_IP=127.0.0.1

But dont works, can you give some help to try understand why spark is not starting correctly?

hive-default.xml.template

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->

In the home folder I get the same issues:

[hadoopadmin@hadoop home]$ pwd
/home
[hadoopadmin@hadoop home]$ 

Folder permissions:

[hadoopdadmin@hadoop spark-1.6.1-bin-hadoop2.6]$ ls -la
total 1416
drwxr-xr-x. 12 hadoop hadoop    4096 .
drwxr-xr-x. 16 root   root      4096  ..
drwxr-xr-x.  2 hadoop hadoop    4096  bin
-rw-r--r--.  1 hadoop hadoop 1343562  CHANGES.txt
drwxr-xr-x.  2 hadoop hadoop    4096  conf
drwxr-xr-x.  3 hadoop hadoop    4096  data
drwxr-xr-x.  3 hadoop hadoop    4096  ec2
drwxr-xr-x.  3 hadoop hadoop    4096  examples
drwxr-xr-x.  2 hadoop hadoop    4096  lib
-rw-r--r--.  1 hadoop hadoop   17352  LICENSE
drwxr-xr-x.  2 hadoop hadoop    4096  licenses
-rw-r--r--.  1 hadoop hadoop   23529  NOTICE
drwxr-xr-x.  6 hadoop hadoop    4096  python
drwxr-xr-x.  3 hadoop hadoop    4096  R
-rw-r--r--.  1 hadoop hadoop    3359  README.md
-rw-r--r--.  1 hadoop hadoop     120  RELEASE
drwxr-xr-x.  2 hadoop hadoop    4096  sbin
codin
  • 743
  • 5
  • 15
  • 27
  • am facing a similar issue i can't run spark 1.6 locally , how did you manage to solve this issue ? – Mero Sep 27 '16 at 14:44

2 Answers2

10

Apparently you don't have permissions to write in that directory, I recommend you to run ./spark-shell in your HOME (you might want to add that command to your PATH), or in any other directory accessible and writable by your user.

This might also be relevant for you Notebooks together with Spark

Community
  • 1
  • 1
Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93
4

You are using spark built with hive support.

There are two possible solutions based on what you want to do later with your spark-shell or in your spark jobs -

  1. You want to access hive tables in your hadoop+hive installation. You should place hive-site.xml in your spark installation's conf sub-directory. Find hive-site.xml from your existing hive installation. For example, in my cloudera VM the hive-site.xml is at /usr/lib/hive/conf. Launching the spark-shell after doing this step should successfully connect to existing hive metastore and will not try to create a temporary .metastore database in your current working directory.
  2. You do NOT want to access hive tables in your hadoop+hive installation. If you do not care about connecting to hive tables, then you can follow Alberto's solution. Fix the permission issues in the directory from which you are launching spark-shell. Make sure you are allowed to create directories/files in that directory.

Hope this helps.

Pranav Shukla
  • 2,206
  • 2
  • 17
  • 20
  • 1
    Thanks for your answer. Im trying your first point. But the error still contiues. Im using hive-1.2.1 and I copy the hive-default.xml.tepmplate and the issue continues icual.. – codin Mar 29 '16 at 15:36
  • 1
    @codin Please look for hive-site.xml. Spark will look for hive-site.XML in its conf directory. You can try renaming hive-default.XML as hive-site.xml in spark's conf dir. – Pranav Shukla Mar 29 '16 at 16:12
  • Thanks again for answering. But that file dont appears in the hive folder. And I didnt any configuration on that. To install and configure hive I just exctract to the local foder and configure in .bashrc the hive prefix. – codin Mar 29 '16 at 16:29
  • Ok, so you have also downloaded hive-1.2.1 manually? i.e. the version of hive you are using did not come in any standard hadoop distribution? Did you setup both hadoop and hive manually (i.e. didn't use any quickstart VM of cloudera or hortonworks etc? – Pranav Shukla Mar 29 '16 at 17:20
  • Yes, hadoop and hive were installed manual just download de tar files and configure! – codin Mar 29 '16 at 19:12
  • 2
    If you are just doing POC on your dev environment, I would recommend using a quickstart VM, as getting hadoop+hive etc can get really involved. You could still use your own version of spark as you did (1.6), if you have correctly configured hive, you should have hive-site.xml – Pranav Shukla Mar 29 '16 at 20:54
  • Thanks for your tip. But doing that way maybe its not the better way to learn the things well about configuration and so on, or you get also that knowledge using a quickstart VM? – codin Mar 29 '16 at 21:10
  • 2
    Quickstart VM will help you in learning coz it has fully working setup of all configurations together which you can take as a reference and reason about. Surely, doing everything manually is a great way to learn all internals. For a beginner in hadoop, it is intimidating to learn all technologies, their configurations and then actual programming with spark or map-reduce. It is advised to use a quickstart VM atleast to rule out any misconfigurations to focus on learning the actual distributed computing. – Pranav Shukla Mar 30 '16 at 07:14
  • Thanks again for your help! And If I use one quickstart vm, that quickstart vm already have spark? Or spark we need to install? – codin Mar 31 '16 at 14:04
  • Most of the hadoop distributions come with spark packaged inside them. Cloudera's latest VM has spark 1.5.x version. All components within the VM are well tested to rule out any compatibility issues. If you want spark 1.6 on that VM, you can download and extract it. You'll just need to add hive-site XML in conf directory of spark and you'll be ready to go. – Pranav Shukla Mar 31 '16 at 18:14
  • @codin thanks for accepting the answer. Hope your problem is fixed :-) – Pranav Shukla Apr 05 '16 at 11:32