0

What is the effective way to copy smaller files from multiple HDFS folders to one destination folder? The smaller files also need to be merged for Map-Reduce to be effective.

Manikandan Kannan
  • 8,684
  • 15
  • 44
  • 65
  • Possible duplicate: http://stackoverflow.com/questions/3548259/merging-multiple-files-into-one-within-hadoop – ffriend Aug 06 '13 at 13:05

2 Answers2

0

There is DistCp which is an map-reduce job which copies files from one or multiple source folders to one target folder in an parallel manner. However, its not merging files. But maybe you could use filecrush to do that! (let me know how this goes!)

oae
  • 1,513
  • 1
  • 17
  • 23
0

You can simply run default Map-Reduce job (with default Mapper and Reducer) with "multiple HDFS folders" as input and one destination folder as output.

Obus
  • 51
  • 1
  • 6