How to convert spark dataframe to python dataframe using a loop

Asked Feb 16 '21 at 05:36

Active Feb 16 '21 at 05:41

Viewed 115 times

I have created a spark dataframe which has 500k rows. If i convert it to python dataframe using pandas_df = spark_df.toPandas(), it takes a lot of time and disconnects. How can i create a loop which pulls up 100k rows from spark dataframe and puts it to python data frame and than iterate 5 times to create 5 df with 100k rows each?

edited Feb 16 '21 at 05:41

drum

5,416
7
57
91

asked Feb 16 '21 at 05:36

Nitin

1

What is your end goal? You can try exporting the dataframe into csv and import it back to pandas in chunks – drum Feb 16 '21 at 05:47
or you can try this to split the data https://stackoverflow.com/questions/48884960/how-to-slice-a-pyspark-dataframe-in-two-row-wise – drum Feb 16 '21 at 05:47

How to convert spark dataframe to python dataframe using a loop

0 Answers0