3

How to check if DataFrame(Scala) is empty in fastest way?I use DF.limit(1).rdd.isEmpty, faster than DF.rdd.isEmpty,but not ideal.Is there any better way to do that?

yjxyjx
  • 121
  • 1
  • 3
  • 8
  • 1
    Does this answer your question? [How to check if spark dataframe is empty?](https://stackoverflow.com/questions/32707620/how-to-check-if-spark-dataframe-is-empty) – user3370741 Aug 03 '21 at 16:49

1 Answers1

3

I usually wrap a call to first around a Try:

import scala.util.Try

val t = Try(df.first)

From there you can match on it if it's a Success or Failure to control logic:

import scala.util.{Success,Failure}

t match {
  case Success(df) => //do stuff with the dataframe

  case Failure(e) => 
    // dataframe is empty; do other stuff
    //e.getMessage will return the exception message
}
Ton Torres
  • 1,509
  • 13
  • 24
  • I previously used df.first,but I find it's slower than limit(1)?why? – yjxyjx May 03 '16 at 09:48
  • oh,I'm sorry,I test df.first.if it's empty,it occurs this error---java.util.NoSuchElementException: next on empty iterator – yjxyjx May 03 '16 at 12:00
  • Oops, my mistake; I meant `Try` instead of `Option`. I've updated my answer. – Ton Torres May 05 '16 at 00:31
  • thanks,but its performance is better than DF.limit(1).rdd.isEmpty? – yjxyjx May 05 '16 at 13:07
  • Looking at the source code of `limit` and `head` it looks like `head` calls `limit(1)`, so If ever there was a difference I doubt it would be anything significant in this instance.Still, `df.head` is cleaner and easier to understand (for me, at least). – Ton Torres May 06 '16 at 00:19