1

I am trying to duplicate a dataset which has 30 rows to around 600 Million rows. I am currently using a for loop to iterate and perform union but it is taking a lot of time. Is there any better way to create duplicate rows in pyspark to this huge volume?

Thank you.

Samyak Jain
  • 155
  • 1
  • 2
  • 8
  • 1
    Does this answer your question? [Pyspark: how to duplicate a row n time in dataframe?](https://stackoverflow.com/questions/50624745/pyspark-how-to-duplicate-a-row-n-time-in-dataframe) – vladsiv Dec 14 '21 at 10:12

0 Answers0