How to execute select query on oracle database using pi spark?

Question

I have written a program using pyspark to connect to oracle database and fetch data. Below command works fine and returns the contents of the table:

sqlContext.read.format("jdbc")
    .option("url","jdbc:oracle:thin:user/password@dbserver:port/dbname")
    .option("dbtable","SCHEMA.TABLE")
    .option("driver","oracle.jdbc.driver.OracleDriver")
    .load().show()

Now I do not want to load the entire table data. I want to load selected records. Can I specify select query as part of this command? If yes how?

Note: I can use dataframe and execute select query on the top of it but I do not want to do it. Please help!!

score 0 · Answer 1 · edited May 23 '17 at 10:30

0

You can use subquery in dbtable option

.option("dbtable", "(SELECT * FROM tableName) AS tmp where x = 1")

Here is similar question, but about MySQL

edited May 23 '17 at 10:30

Community

1
1

answered Nov 30 '16 at 10:26

T. Gawęda

15,706
4
46
61

score 0 · Answer 2 · answered Nov 30 '16 at 10:27

In general, the optimizer SHOULD be able to push down any relevant select and where elements so if you now do df.select("a","b","c").where("d<10") then in general this should be pushed down to oracle. You can check it by doing df.explain(true) on the final dataframe.

How to execute select query on oracle database using pi spark?

2 Answers2