I have a use case where we have 800000 json files of size 2KB each. Our requirement is to merge these smaller files into a single large file. We have tried achieving this in Spark using repartition and coalesce. However we are not finding this efficient as this is consuming more time than expected. Is there any alternative to achieve the same in a performant manner ?
Appreciate your help. Thanks in advance.