0

I have created a spark dataframe which has 500k rows. If i convert it to python dataframe using pandas_df = spark_df.toPandas(), it takes a lot of time and disconnects. How can i create a loop which pulls up 100k rows from spark dataframe and puts it to python data frame and than iterate 5 times to create 5 df with 100k rows each?

drum
  • 5,416
  • 7
  • 57
  • 91
Nitin
  • 1
  • 1
    What is your end goal? You can try exporting the dataframe into csv and import it back to pandas in chunks – drum Feb 16 '21 at 05:47
  • or you can try this to split the data https://stackoverflow.com/questions/48884960/how-to-slice-a-pyspark-dataframe-in-two-row-wise – drum Feb 16 '21 at 05:47

0 Answers0