3

On creating temporary in-memory table using createOrReplaceTempView, how and where is the in-memory temporary table stored in the nodes?

Is the whole table created in each and every worker nodes or master node? Or Is the data partitioned and distributed across across all cluster nodes?

Finally is it a good idea to load a huge table with 100 million+ records in memory using createOrReplaceTempView?

user3190018
  • 890
  • 13
  • 26
Arijeet Saha
  • 1,118
  • 11
  • 23
  • 1
    Possible duplicate of [How createOrReplaceTempView works in Spark?](https://stackoverflow.com/questions/44011846/how-createorreplacetempview-works-in-spark) – Ram Ghadiyaram Oct 24 '17 at 06:22
  • 1
    My question is different. How is the tempm table data distributed and how about loading huge no of records using createOrReplaceTempView? – Arijeet Saha Oct 24 '17 at 08:23

1 Answers1

0

PySpark SQL views are lazily evaluated meaning it does not persist in memory unless you cache the dataset by using the cache() method.

Source: https://sparkbyexamples.com/pyspark/pyspark-createorreplacetempview/

Codistan
  • 1,469
  • 1
  • 13
  • 17