2

I know count action can be expensive in Spark, so to improve performance I'd like to have a different way just to check if a query can return any results

Here is what I did

var df = spark.sql("select * from table_name where condition = 'blah' limit 1");
var dfEmpty = df.head(1).isEmpty;

Is it a valid solution or is there any potential uncaught error if I use above solution to check query result? It is a lot faster though.

Dreamer
  • 7,333
  • 24
  • 99
  • 179
  • Possible duplicate of [How to check if spark dataframe is empty?](https://stackoverflow.com/questions/32707620/how-to-check-if-spark-dataframe-is-empty) – user10938362 May 19 '20 at 21:03
  • 1
    not exactly duplicate.... user already knew that he can do that , is that efficient or not is the question or any better alternative is there or not is the query – user3190018 May 19 '20 at 22:21

2 Answers2

4

isEmpty is head of the data.. this is quite resonable to check empty or not and it was given by spark api and is optimized... Hence, I'd prefer this...

Also in the query I think limit 1 is not required.


  /**
   * Returns true if the `Dataset` is empty.
   *
   * @group basic
   * @since 2.4.0
   */
  def isEmpty: Boolean = withAction("isEmpty", limit(1).groupBy().count().queryExecution) { plan =>
    plan.executeCollect().head.getLong(0) == 0
  }
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:46
3

I think this is ok, I guess you could also omit the limit(1) because this is also part of the implementation of df.isEmpty. See also How to check if spark dataframe is empty?.

Note that the solution with df.isEmpty does may not evaluate all columns. E.g. if you have an UDF for 1 column, this will probabely not execute and could throws exceptions on a real query. df.head(1).isEmpty on the other hand will evaluate all columns for 1 rows.

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145