I want to save a json data into single file in hdfs. currently my approach is to save data into hdfs using spark then merge the data into local (local_tmp_file) and then move it into hdfs (dest)
getmerge_command = 'hdfs dfs -getmerge ' + dest + ' ' + local_tmp_file
move_command = 'hdfs dfs -moveFromLocal ' + local_tmp_file + ' ' + dest
the problem happened when there is a lot of process run at the same time and use the temporary local storage which makes the disk full. Is anyone have any solution for this?