I have created a spark dataframe which has 500k rows. If i convert it to python dataframe using pandas_df = spark_df.toPandas()
, it takes a lot of time and disconnects. How can i create a loop which pulls up 100k rows from spark dataframe and puts it to python data frame and than iterate 5 times to create 5 df with 100k rows each?
Asked
Active
Viewed 115 times
0
-
1What is your end goal? You can try exporting the dataframe into csv and import it back to pandas in chunks – drum Feb 16 '21 at 05:47
-
or you can try this to split the data https://stackoverflow.com/questions/48884960/how-to-slice-a-pyspark-dataframe-in-two-row-wise – drum Feb 16 '21 at 05:47