I am trying to duplicate a dataset which has 30 rows to around 600 Million rows. I am currently using a for loop to iterate and perform union but it is taking a lot of time. Is there any better way to create duplicate rows in pyspark to this huge volume?
Thank you.