If i read a file in pyspark:
Data = spark.read(file.csv)
Then for the life of the spark session, the ‘data’ is available in memory,correct? So if i call data.show() 5 times, it will not read from disk 5 times. Is it correct? If yes, why do i need:
Data.cache()