I created a very large Spark Dataframe with PySpark on my cluster, which is too big to fit into memory. I also have an autoencoder model with Keras, which takes in a Pandas dataframe (in-memory object).
What is the best way to bring those two worlds together?
I found some libraries that provide Deep Learning on Spark, but is seems only for hyper parameter tuning or wont support autoencoders like Apache SystemML
I am surely not the first one to train a NN on Spark Dataframes. I have a conceptual gap here, please help!