5

While trying to read data from oracle database using spark on AWS EMR, I am getting this error message:

java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver.

Can someone let me know if anyone faced this issue and how they resolved it?

pyspark --driver-class-path /home/hadoop/ojdbc7.jar --jars   /home/hadoop/ojdbc7.jar

from pyspark import SparkContext, HiveContext, SparkConf

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

df = sqlContext.read.format("jdbc").options(url="jdbc:oracle:thin:user/pass@//10.200.100.142:1521/BMD", driver = "oracle.jdbc.driver.OracleDriver", 
dbtable="S_0COORDER_TEXT_D").load()
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
KiranK
  • 51
  • 1
  • 1
  • 2

4 Answers4

2

Although You haven't mentioned which version of spark you are using... you can try below....

import jars to both driver & executor. So, you need to edit conf/spark-defaults.conf adding both lines below.

spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
spark.executor.extraClassPath /home/hadoop/ojdbc7.jar

or
you can try to pass while submitting job like below example :

--conf spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
--conf spark.executor.extraClassPath /home/hadoop/ojdbc7.jar
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • 1
    we are using spark 2.0.0 version. I have added jars to both driver and executor in conf/spark-defaults.conf, but I am still facing the same issue. py4j.protocol.Py4JJavaError: An error occurred while calling o54.load. : java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver – KiranK May 01 '17 at 16:34
  • I have tried with pyspark --jars /home/hadoop/ojdbc7.jar --conf spark.driver.extraClassPath=/home/hadoop/ojdbc7.jar --conf spark.executor.extraClassPath=/home/hadoop/ojdbc7.jar, still same issue. – KiranK May 01 '17 at 17:01
  • the same settings which you have suggested is working for a different team but not me, it is weird. – KiranK May 09 '17 at 19:50
  • there might be specific issue in your thing. it should work. since its working other team you mentioned please accept the solution as owner! – Ram Ghadiyaram May 11 '17 at 05:13
1

add codes below to your_spark_home_path/conf/spark-defaults.conf,'/opt/modules/extraClass/' is dir where i put extra jars:

spark.driver.extraClassPath = /opt/modules/extraClass/jodbc7.jar
spark.executor.extraClassPath = /opt/modules/extraClass/jodbc7.jar

or you can simple add jodbc7.jar to your_spark_home_path/jars.

  • @BdEngineer Thats because your executors may not have the jdbc jar present. Run your spark-submit command with `--jars /path/to/jdbc.jar` argument. This argument distributes the jar to all the executors. – mk7 Nov 27 '19 at 02:25
1

I was having the exact same problem on an AWS EMR cluster (emr-5.31.0).

Setting spark.driver.extraClassPath and spark.executor.extraClassPath in SparkSession.builder.config(), or spark-defaults.conf, or with the spark-submit --jars command to the location of the jodbc6.jar did not work.

I finally got it to work by passing the Maven coordinates to spark.jars.packages and then I also had to set spark.driver.extraClassPath and spark.executor.extraClassPath to $HOME/.ivy2/jars/*.

import os
from pyspark.sql import SparkSession

spark_packages_list = [
    'io.delta:delta-core_2.11:0.6.1',
    'com.oracle.database.jdbc:ojdbc6:11.2.0.4',
]
spark_packages = ",".join(spark_packages_list)

home = os.getenv("HOME")

spark = (
    SparkSession
    .builder
    .config("spark.jars.packages", spark_packages)
    .config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*")
    .config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*")
)

Then the following worked (change parameters accordingly):

host = "111.111.111.111"
port = "1234"
schema = "YourSchema"
URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"

with open(f"{home}/username.file", "r") as f:
    username = f.read()

with open(f"{home}/password.file", "r") as f:
    password = f.read()

query  = "SELECT * FROM YourTable"

df = (spark.read.format("jdbc")
    .option("url", URL)
    .option("query", query)
    .option("user", username)
    .option("password", password)
    .load()
)

df.printSchema()
df.show()

OR

properties = {
    "user": username,
    "password": password,
}

df = spark.read.jdbc(
    url=URL, 
    table="YourTable",
    properties=properties,
    )

df.printSchema()
df.show()
Clay
  • 2,584
  • 1
  • 28
  • 63
0

for solving this problem just add suitable ojdbc.jar driver to :

$SPARK_HOME/jars/

then if the exception changed to somthing like this:

py4j.protocol.Py4JJavaError: An error occurred while calling o52.load. : java.sql.SQLException: ORA-28040: No matching authentication protocol

it is because of the conflict in ojdbc driver mentioned here:

https://stackoverflow.com/questions/24100117/ora-28040-no-matching-authentication-protocol-exception

then simply change the driver

adramazany
  • 625
  • 9
  • 15