Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database

Question

I am trying to run SparkSQL :

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

But the error i m getting is below:

        ... 125 more
Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
        ... 122 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)

I see there is a metastore_db folder exists..
My hive metastore includes mysql as metastore.But not sure why the error shows as derby execption

Is your question: "I thought I was using mysql for my metastore_db, but Spark thinks I am using Derby. Why is that?" Or is your question: "I intended to use Derby for my metastore_db, but Spark is failing to open the database because some other Derby application already has the database open. Why is that?" — Bryan Pendleton, Dec 25 '15 at 21:31
No..I dont want to use derby..it takes autometically....not sure why this is error.. — Amaresh, Dec 26 '15 at 14:43
This will help https://dataunbox.com/caused-by-error-xsdb6-another-instance-of-derby-may-have-already-booted-the-database-metastore_db/ — Amaresh, Jul 27 '20 at 12:22

score 37 · Answer 1 · answered Jul 13 '16 at 05:31

37

I was getting the same error while creating Data frames on Spark Shell :

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /metastore_db.

Cause:

I found that this is happening as there were multiple other instances of Spark-Shell already running and holding derby DB already, so when i was starting yet another Spark Shell and creating Data Frame on it using RDD.toDF() it was throwing error:

Solution:

I ran the ps command to find other instances of Spark-Shell:

ps -ef | grep spark-shell

and i killed them all using kill command:

kill -9 Spark-Shell-processID ( example: kill -9 4848)

after all the SPark-Shell instances were gone, i started a new SPark SHell and reran my Data frame function and it ran just fine :)

answered Jul 13 '16 at 05:31

Dean Jain

1,959
19
15

Killing hive process and restarting it helped, Thanks! – VishnuVardhanA Apr 26 '17 at 06:10
ps -ef | grep spark-shell – Shyam Gupta May 24 '17 at 12:29
This helped me. – pranaygoyal02 Sep 05 '18 at 09:39
This. `spark-shell` was running even though I had killed `iTerm2`. That was an interesting phenomenom. – WestCoastProjects Feb 08 '19 at 17:49

score 14 · Answer 2 · answered Feb 05 '16 at 17:37

14

If you're running in spark shell, you shouldn't instantiate a HiveContext, there's one created automatically called sqlContext (the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here.

If you're not running in shell - this exception means you've created more than one HiveContext in the same JVM, which seems to be impossible - you can only create one.

answered Feb 05 '16 at 17:37

Tzach Zohar

37,442
3
79
85

But if the shell is doing so automatically, how could a user avoid it to work around the issue? – Holger Brandl Sep 05 '17 at 20:44
The user doesn't need to work around it - the cause for the error is the user trying to create _another_ context by calling `new org.apache.spark.sql.hive.HiveContext(sc)` - if you simply avoid doing that, you lose nothing (because you already have a HiveContext you can use) and overcome the error. – Tzach Zohar Sep 05 '17 at 20:48
3

The problem is that it seems to happen automatically with current spark v2.2. Just starting a second spark-shell on the same machine is sufficient to cause the error. – Holger Brandl Sep 06 '17 at 06:25

ShyamSharma · Answer 3 · 2019-07-23T07:34:55.217

6

an lck(lock) file is an access control file which locks the database so that only a single user can access or update the database. The error suggests that there is another instance which is using the same database. Thus you need to delete the .lck files. In your home directory, go to metastore_db and delete any .lck files.

edited Jul 23 '19 at 07:34

answered Jul 10 '19 at 10:47

ShyamSharma

61
1
4

With no knowledge of hadoop, I wonder if this question give information enough to be useful. Consider editing it and adding info of what `.lck` files are. – Cleptus Jul 10 '19 at 11:20
this worked for me! Was trying to load elastic search – Sergei Wallace Oct 05 '21 at 18:47

newtover · Answer 4 · 2018-02-05T10:29:50.197

Another case where you can see the same error is a Spark REPL of an AWS Glue dev endpoint, when you are trying to convert a dynamic frame into a dataframe.

There are actually several different exceptions like:

pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
ERROR XSDB6: Another instance of Derby may have already booted the database /home/glue/metastore_db.
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader

The solution is hard to find with google but eventually it is described here.

The loaded REPL contains an instantiated SparkSession in a variable spark and you just need to stop it before creating a new SparkContext:

>>> spark.stop()
>>> from pyspark.context import SparkContext
>>> from awsglue.context import GlueContext
>>>
>>> glue_context = GlueContext(SparkContext.getOrCreate())
>>> glue_frame = glue_context.create_dynamic_frame.from_catalog(database=DB_NAME, table_name=T_NAME)
>>> df = glue_frame.toDF()

score 2 · Answer 5 · edited Aug 24 '17 at 14:52

2

If you are facing issue during bringing up WAS application on windows machine:

kill java processes using task manager
delete db.lck file present in WebSphere\AppServer\profiles\AppSrv04\databases\EJBTimers\server1\EJBTimerDB (My DB is EJBTimerDB which was causing issue)
restart application.

edited Aug 24 '17 at 14:52

Sandy Gifford

7,219
3
35
65

answered Aug 24 '17 at 14:41

user3007369

21
3

score 2 · Answer 6 · edited Nov 09 '17 at 07:32

2

I was facing the same issue while creating table.

sqlContext.sql("CREATE TABLE....

I could see many entries for ps -ef | grep spark-shell so I killed all of them and restarted spark-shell. It worked for me.

edited Nov 09 '17 at 07:32

Martin Evans

45,791
17
81
97

answered Nov 09 '17 at 07:06

Hegde

234
2
3

score 1 · Answer 7 · answered Oct 16 '17 at 11:00

1

This happened when I was using pyspark ml Word2Vec. I was trying to load previously built model. Trick is, just create empty data frame of pyspark or scala using sqlContext. Following is the python syntax -

from pyspark.sql.types import StructType

schema = StructType([])`
empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)

This is a workaround. My problem fixed after using this block. Note - It only occurs when you instantiate sqlContext from HiveContext, not SQLContext.

answered Oct 16 '17 at 11:00

Subhojit Mukherjee

1,355
1
9
2

Wow, that works great for me thanks, I was spending ages trying to load a Pipeline.model from pyspark.ml which I previously saved on another system. Do you know what causes this issue? (also minor thing: I tried to edit your answer to remove the stray backtick but StackOverflow wouldn't allow me since the edit was less than 6 characters). – Shane Halloran Nov 08 '17 at 16:28

score 0 · Answer 8 · answered Apr 08 '16 at 09:28

I got this error by running sqlContext._get_hive_ctx() This was caused by initially trying to load a pipelined RDD into a dataframe I got the error Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o29)) So you could running this before rebuilding it, but FYI I have seen others reporting this did not help them.

score 0 · Answer 9 · answered Nov 18 '20 at 04:55

I am getting this error while running test cases in my multi maven spark setup. I was creating sparkSession in my test classes separately as unit test cases required different spark parameters every time which I am passing it through a configuration file. To resolve this I followed this approach. While creating the sparkSession in Spark 2.2.0

//This is present in my Parent Trait.
def createSparkSession(master: String, appName: String, configList: List[(String, String)]): SparkSession ={
    val sparkConf = new SparkConf().setAll(configList)
    val spark = SparkSession
      .builder()
      .master(master)
      .config(sparkConf)
      .enableHiveSupport()
      .appName(appName)
      .getOrCreate()
    spark
  }

In my test classes

//metastore_db_test will test class specific folder in my modules.
val metaStoreConfig = List(("javax.jdo.option.ConnectionURL", "jdbc:derby:;databaseName=hiveMetaStore/metastore_db_test;create=true"))
    val configList = configContent.convertToListFromConfig(sparkConfigValue) ++ metaStoreConfig
    val spark = createSparkSession("local[*]", "testing", configList)

And post that in maven clean plugin I am cleaning this hiveMetaStore directory.

//Parent POM
<plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.1.0</version>
                    <configuration>
                        <filesets>
                            <fileset>
                                <directory>metastore_db</directory>
                            </fileset>
                            <fileset>
                                <directory>spark-warehouse</directory>
                            </fileset>
                        </filesets>
                    </configuration>
                </plugin>

Child Module POM

<plugin>
                <artifactId>maven-clean-plugin</artifactId>
                <configuration>
                    <filesets>
                        <fileset>
                            <directory>hiveMetaStore</directory>
                            <includes>
                                <include>**</include>
                            </includes>
                        </fileset>
                        <fileset>
                            <directory>spark-warehouse</directory>
                        </fileset>
                    </filesets>
                </configuration>
            </plugin>

is it working fine for multiple test cases ? for one or 2 test cases its working fine when I am running multiple suites then I am getting this error — Ram Ghadiyaram, Feb 13 '23 at 04:01

score -2 · Answer 10 · edited Jun 26 '17 at 17:34

-2

The error came because of the multiple spark shell you are trying to run in same node or due to system failure its shut down without proper exit the spark shell, In any of the reason you just find out the process id and kill them, for that us

[hadoop@localhost ~]$ ps -ef | grep spark-shell
hadoop    11121   9197  0 17:54 pts/0    00:00:00 grep --color=auto spark-shell
[hadoop@localhost ~]$ kill 9197

edited Jun 26 '17 at 17:34

Jozef Dúc

965
2
18
29

answered May 24 '17 at 12:36

Shyam Gupta

489
4
8

1

I'm a bit late to the party but in your example you just killed the process you used to look for processes. And even that was already gone at the point you used kill. – Jochen Ullrich Aug 09 '18 at 07:02

score -3 · Answer 11 · answered May 20 '17 at 19:57

-3

its very difficult to find where your derby metastore_db is access by another thread, if you are able to find the process then you can kill it using kill command.

Best solutions to restart the system.

answered May 20 '17 at 19:57

Shyam Gupta

489
4
8

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database

11 Answers11

Linked

Related