Spark 2.x - How to generate simple Explain/Execution Plan

Question

I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark documentation search and a SO search here but couldn't find a syntax that worked for my situation.

Here is a simple example that works as expected:

scala> List(1, 2, 3, 4).toDF.explain
== Physical Plan ==
LocalTableScan [value#42]

Here's an example that is not working as expected but hoping to get to work:

scala> List(1, 2, 3, 4).toDF.count.explain
<console>:24: error: value explain is not a member of Long
List(1, 2, 3, 4).toDF.count.explain
                               ^

And here's a more detailed example to further exhibit the end goal of the partition pruning that I am hoping to confirm via explain plan.

val newDf = spark.read.parquet(df).filter(s"start >= ${startDt}").filter(s"start <= ${endDt}")

Thanks in advance for any thoughts/feedback.

`explain` is Spark SQL function offered as part of the Dataset/DataFrame API. When you do something like `count` or `collect`, it returns `Long` and `Array` types respectively which doesn't have the `explain` as their member. — Sivaprasanna Sethuraman, May 29 '18 at 05:27

score 3 · Answer 1 · answered May 29 '18 at 08:27

3

count method is eagerly evaluated and as you see returns Long, so there is no execution plan available.

You have to use a lazy transformation, either:

import org.apache.spark.sql.functions.count

df.select(count($"*"))

or

df.groupBy().agg(count($"*"))

answered May 29 '18 at 08:27

Alper t. Turker

34,230
9
83
115

Spark 2.x - How to generate simple Explain/Execution Plan

1 Answers1

Linked