I'm trying to learn how to get a feel of what is going on inside Spark, and here's my current confusion. I'm trying to read first 200 rows from an Oracle table into Spark:
val jdbcDF = spark.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:...",
"dbtable" -> "schema.table",
"fetchSize" -> "5000",
"partitionColumn" -> "my_row_id",
"numPartitions" -> "16",
"lowerBound" -> "0",
"upperBound" -> "9999999"
)).load()
jdbcDF.limit(200).count()
This, I would expect, to be fairly quick. Similar action on a table with 500K rows completes in a reasonable time. In this particular case, the table is much bigger (hundreds of millions of rows), but limit(200) would, I'd think, make it fast? How do I go about figuring out where it spending its time?