I'm running a Hadoop job on a bunch of gzipped input files. Hadoop should handle this easily... mapreduce in java - gzip input files
Unfortunately, in my case, the input files don't have a .gz
extension. I'm using CombineTextInputFormatClass
, which runs my job fine if I point it at non-gzipped files, but I basically just get a bunch of garbage if I point it at the gzipped ones.
I've tried searching for quite some time, but the only thing I've turned up is somebody else asking the same question as I have, with no answer... How to force Hadoop to unzip inputs regadless of their extension?
Anybody got anything?