0

I am loading some data from oracle db and i am trying to do count action on the data.But not getting any result and eventually getting session timeout. If i perform any other functions like first(),show() i am getting result .

I tried launching scala shell with different options like below.

 spark2-shell --jars /oracle/12c/product/12.1.0/client/jdbc/lib/ojdbc6.jar --executor-memory 10G  --num-executors 10 --driver-memory 2G yarn-client 

 spark2-shell --jars /oracle/12c/product/12.1.0/client/jdbc/lib/ojdbc6.jar 

val df = spark.read
.format("jdbc")
.option("url","jdbc:oracle:thin:@//MY_IP:MY_PORT/MY_SERVICE_NAME")
.option("dbtable","MY_TABLE")
.option("query", "select 'a','6b',count(*) from MY_TABLE_NAME partition 
for (to_date('07-MAY-2019','DD-MON-YYYY')) where COL='SOMETHING'")
.option("driver","oracle.jdbc.driver.OracleDriver")
.option("user", "MY_USERNAME")
.option("password", "MYPASSWORD")
.load()


 df.count()

This is not providing any result

I am expecting this count function to give the total count of record got dumped from the table.

1 Answers1

1

There is no such option as query so your code fetches all the data from the external table, furthermore it does that using only a single partition.

The correct option is dbtable and the query, as per linked answer, should be provided as a subquery:

.option("dbtable", "(select 'a','6b',count(*) from MY_TABLE_NAME partition 
  for (to_date('07-MAY-2019','DD-MON-YYYY')) where COL='SOMETHING') as t")
  • thanks for this response.i created the query as suggested and worked for me.Yes the issue seems to be that table was dumping full data and count is taking much time to work on full data. – Jayanth Venkat Jul 06 '19 at 17:48