spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver

Question

While trying to read data from oracle database using spark on AWS EMR, I am getting this error message:

java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver.

Can someone let me know if anyone faced this issue and how they resolved it?

pyspark --driver-class-path /home/hadoop/ojdbc7.jar --jars   /home/hadoop/ojdbc7.jar

from pyspark import SparkContext, HiveContext, SparkConf

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

df = sqlContext.read.format("jdbc").options(url="jdbc:oracle:thin:user/pass@//10.200.100.142:1521/BMD", driver = "oracle.jdbc.driver.OracleDriver", 
dbtable="S_0COORDER_TEXT_D").load()

score 2 · Answer 1 · answered May 01 '17 at 14:41

2

Although You haven't mentioned which version of spark you are using... you can try below....

import jars to both driver & executor. So, you need to edit conf/spark-defaults.conf adding both lines below.

spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
spark.executor.extraClassPath /home/hadoop/ojdbc7.jar

or
you can try to pass while submitting job like below example :

--conf spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
--conf spark.executor.extraClassPath /home/hadoop/ojdbc7.jar

answered May 01 '17 at 14:41

Ram Ghadiyaram

28,239
13
95
121

1

we are using spark 2.0.0 version. I have added jars to both driver and executor in conf/spark-defaults.conf, but I am still facing the same issue. py4j.protocol.Py4JJavaError: An error occurred while calling o54.load. : java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver – KiranK May 01 '17 at 16:34
I have tried with pyspark --jars /home/hadoop/ojdbc7.jar --conf spark.driver.extraClassPath=/home/hadoop/ojdbc7.jar --conf spark.executor.extraClassPath=/home/hadoop/ojdbc7.jar, still same issue. – KiranK May 01 '17 at 17:01
the same settings which you have suggested is working for a different team but not me, it is weird. – KiranK May 09 '17 at 19:50
there might be specific issue in your thing. it should work. since its working other team you mentioned please accept the solution as owner! – Ram Ghadiyaram May 11 '17 at 05:13

score 1 · Answer 2 · answered May 02 '17 at 12:28

1

add codes below to your_spark_home_path/conf/spark-defaults.conf,'/opt/modules/extraClass/' is dir where i put extra jars:

spark.driver.extraClassPath = /opt/modules/extraClass/jodbc7.jar
spark.executor.extraClassPath = /opt/modules/extraClass/jodbc7.jar

or you can simple add jodbc7.jar to your_spark_home_path/jars.

answered May 02 '17 at 12:28

Changxin Cheng

11
2

@BdEngineer Thats because your executors may not have the jdbc jar present. Run your spark-submit command with `--jars /path/to/jdbc.jar` argument. This argument distributes the jar to all the executors. – mk7 Nov 27 '19 at 02:25

score 1 · Answer 3 · answered Nov 11 '20 at 16:03

I was having the exact same problem on an AWS EMR cluster (emr-5.31.0).

Setting spark.driver.extraClassPath and spark.executor.extraClassPath in SparkSession.builder.config(), or spark-defaults.conf, or with the spark-submit --jars command to the location of the jodbc6.jar did not work.

I finally got it to work by passing the Maven coordinates to spark.jars.packages and then I also had to set spark.driver.extraClassPath and spark.executor.extraClassPath to $HOME/.ivy2/jars/*.

import os
from pyspark.sql import SparkSession

spark_packages_list = [
    'io.delta:delta-core_2.11:0.6.1',
    'com.oracle.database.jdbc:ojdbc6:11.2.0.4',
]
spark_packages = ",".join(spark_packages_list)

home = os.getenv("HOME")

spark = (
    SparkSession
    .builder
    .config("spark.jars.packages", spark_packages)
    .config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*")
    .config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*")
)

Then the following worked (change parameters accordingly):

host = "111.111.111.111"
port = "1234"
schema = "YourSchema"
URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"

with open(f"{home}/username.file", "r") as f:
    username = f.read()

with open(f"{home}/password.file", "r") as f:
    password = f.read()

query  = "SELECT * FROM YourTable"

df = (spark.read.format("jdbc")
    .option("url", URL)
    .option("query", query)
    .option("user", username)
    .option("password", password)
    .load()
)

df.printSchema()
df.show()

OR

properties = {
    "user": username,
    "password": password,
}

df = spark.read.jdbc(
    url=URL, 
    table="YourTable",
    properties=properties,
    )

df.printSchema()
df.show()

score 0 · Answer 4 · answered Mar 09 '22 at 11:07

for solving this problem just add suitable ojdbc.jar driver to :

$SPARK_HOME/jars/

then if the exception changed to somthing like this:

py4j.protocol.Py4JJavaError: An error occurred while calling o52.load. : java.sql.SQLException: ORA-28040: No matching authentication protocol

it is because of the conflict in ojdbc driver mentioned here:

https://stackoverflow.com/questions/24100117/ora-28040-no-matching-authentication-protocol-exception

then simply change the driver

spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver

4 Answers4

Linked