I'm running a MapReduce task on Gzipped .arc files. Similar to this question, I'm having difficulties as the Gzip decompression is running automatically (since files have a .gz extension), but it is causing issues on newline/carriage-return being rendered as just newline as per Unix file encoding. This makes the input completely unreadable since it depends on specific character counts embedded in the file. I am trying to disable the Gzip decompression so I can do it instead in my mapper, correctly. I have tried:
-jobconf stream.recordreader.compression=none
But that doesn't seem to affect the compression. Is there any way I can prevent Gzip decompression on my input?
Thanks, -Geoff