0

I have to load millions of XML files from S3 and process it in the spark . But loading of all files should be done in specific order . So for example I have appended time series in the name of the folder ,Now I need to sort all files by time series prefix and load it in the spark in same sorted order .

Order of the files should not change while loading into spark data frame .

Can we do this in spark?

Atharv Thakur
  • 671
  • 3
  • 21
  • 39
  • _But loading of all files should be done in specific order_ - sounds like you're making really dangerous assumptions here and it is likely to be [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – zero323 Feb 07 '18 at 20:37
  • The answer was answered at [another](https://stackoverflow.com/questions/31051107/read-multiple-files-from-a-directory-using-spark) SO question. – beyondfloatingpoint Feb 19 '18 at 17:07

0 Answers0