I was wondering whether I can write from 2 separate Spark processes into 1 HDFS directory. Will there be a file names collision in this case. Files are written in form of 'part-00000-00c0472e-a01e-4ea6-b247-57114107c762.c000.txt'. Is there a chance that 2 separate Spark processes generate identical file names and one overwrites files of the other?
Asked
Active
Viewed 108 times
0
-
See https://stackoverflow.com/questions/38964736/multiple-spark-jobs-appending-parquet-data-to-same-base-path-with-partitioning Looks like aspects tk consider. – thebluephantom Oct 14 '18 at 15:57
-
Yes, indeed. Thanks for pointing this out – sparker Oct 15 '18 at 05:57