4
df = spark.read.format('csv').load('...')

It is my understanding that , load is a transformation and executes only when an action is called. However, while the load statement is being executed, it appears to be an action under the Spark UI.

Edit:

From the comments/answers , i inferred that load may or may not be a transformation but not definitely an action which is great and understandable.

If it is not an action why it is creating a DAG? It creates a DAG just for a load statement not just WholeStageCodegen(which is in SQL tab). Please see the below image: Screenshot

j raj
  • 167
  • 1
  • 2
  • 9
  • It is a transformation – pissall Oct 15 '19 at 10:57
  • 1
    https://stackoverflow.com/questions/56818629/what-does-load-do-in-spark – thebluephantom Oct 15 '19 at 11:25
  • Thank you for the response. I see from the shared link, that it is a transformation and take some time to execute because it does metadata checks and all. My another question stills stands unanswered. Why a simple load statement is creating a DAG , which should not happen. – j raj Oct 15 '19 at 11:41
  • The thing under UI is simply wholeStageCodegen, not an Action. Your question is not that specific in relation to your comment. I grant you nit is a little fuzzy. See the link's approved answer, it is also vague though. – thebluephantom Oct 15 '19 at 12:43
  • You should ask a new question. – thebluephantom Oct 15 '19 at 15:10

2 Answers2

1

Specifically, based on your comments:

Load does nothing. It is just part of the sqlContext.read or spark.read.format API as a parameter, that can be set indirectly or directly on the read. read allows data formats to be specified.

The DF or underlying RDD is evaluated lazily as they say.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
0

Load is neither action nor transformation it is a method of class DataFrameReader that describes how to load data from an external data source.

All methods of DataFrameReader merely describe a process of loading a data and do not trigger a Spark job (until an action is called).

This is mentioned by jaceklaskowski Please read https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataFrameReader.html#methods

you can also refer the transformation and action API list from the databricks here https://training.databricks.com/visualapi.pdf load is not mentioned anywhere as a transformation or action

Strick
  • 1,512
  • 9
  • 15