-1

We have a split function in python df_split = np.array_split(df,2), which splits the df by rows into multiple df. How can we achieve the same for a spark df?

Tracy
  • 285
  • 2
  • 10
  • Check if this helps https://stackoverflow.com/questions/62107654/efficiently-batching-spark-dataframes-to-call-an-api/62166913#62166913 – Shubham Jain Jun 12 '20 at 05:36

1 Answers1

0

The simple way is to filter on some condition.

first_half = my_df.filter(condition)
second_half = mydf.filter(~condition)

You may need to add another field to your frame. You did not say how you wanted it to be split. If you want it split in half, such as every other row, then you can add a row number, and then the condition would be row number is even (using modulo or something.

Bob McCormick
  • 207
  • 1
  • 10