1

I'm trying to access to a sql database using JdbcRDD. I would like to do "paging" in my query using ROW_NUMBER() and OVER (ORDER BY...). Knowing the maxBound of my query I would like to run for instance 1 query per page on one executor. I thought the numPartitions of the jdbcRdd constructor would do that for me. But It does not run queries in parallel but in sequence ... which is longer ...

any idea .. or just hints to do that would be appreciated !

Thx

  • How many tasks are in the job that is created? If it has one task per partition, then I highly doubt they would be executed sequentially. (That is, if you have more than one executor.) The executors pick up tasks independently, I don't see how they could conspire to execute them in sequence. – Daniel Darabos Dec 05 '14 at 13:24
  • Might be worth taking a look at http://stackoverflow.com/questions/24916852/how-can-i-connect-to-a-postgresql-database-into-apache-spark-using-scala/24929536#24929536 to see if it contains anything useful for you. – Daniel Darabos Dec 05 '14 at 13:24

0 Answers0