1

I have many xml files on hdfs which i extracted from a sequence files using java program.

Initially, the files were few so I copied the extracted xml files onto my local and then ran a unix zip command then zipped the xmls into a single .zip file.

The no of xml files have now increased and now i cant copy them onto local because I will run out of memory.

My need is to just zip all of those xml files(on hdfs) into a single zipped file(to hdfs) without a need of copying it to local.

I couldnt find any lead to start.. Can anyone provide me a start point or any code(even java MR) they have so that I can go further. I could see this can be done using mapreduce but I have never programmed in it thats why trying other ways

Thanks in advance..

sk7979
  • 140
  • 2
  • 18
  • as pointed by philantrovert (for the copied answer), take a look at this one: https://stackoverflow.com/questions/7153087/hadoop-compress-file-in-hdfs and change the codec to whatever you like – vefthym Sep 15 '17 at 10:41
  • I am trying to zip the files not compressing the files.. – sk7979 Sep 15 '17 at 10:46
  • My understanding is that "zipping the files" is a way of "compressing the files", while other compression codecs are also available... https://en.wikipedia.org/wiki/Zip_(file_format) Do you have a different goal? – vefthym Sep 15 '17 at 12:02
  • yes you are right @vefthym . But, I have milions of xml files and if use a compression codec on the folder which i have the xmls and or even on the xmls itself.. it just compresses the files/folder but the number of xmls wont be reduced right?? Having many files will be a problem for me.. thats why.. – sk7979 Sep 15 '17 at 12:51

0 Answers0