I am running a query in my Spark application that returns a substantially large amount of data. I would like to know how many rows of data are being queried for logging purposes. I can't seem to find a way to get the number of rows without either manually counting them, or calling a method to count for me, as the data is fairly large this gets expensive for logging. Is there a place that the rowcount is saved and available to grab?
I have read here that the Python connector saves the rowcount into the object model, but i can't seem to find any equivalent for the Spark Connector or its underlying JDBC.
The most optimal way I can find is rdd.collect().size
on the RDD that Spark provides. It is about 15% faster than calling rdd.count()
Any help is appreciated