I am trying to persist two very large data frames before performing a join to workaround the "java.util.concurrent.TimeoutException: Futures timed out..." issue (ref: Why does join fail with "java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]"?).
Persist(), alone, works but when I try to specify a storage level, I receive name errors.
I've tried the following:
df.persist(pyspark.StorageLevel.MEMORY_ONLY)
NameError: name 'MEMORY_ONLY' is not defined
df.persist(StorageLevel.MEMORY_ONLY)
NameError: name 'StorageLevel' is not defined
import org.apache.spark.storage.StorageLevel
ImportError: No module named org.apache.spark.storage.StorageLevel
Any help would be greatly appreciated.