0

I was trying to compress a file using the following code. The compression works fine when the size of the file is small(say 1 GB). But when the size of the file is around 5GB the program does not fail rather it keeps on running for 2 days with out any result. Based on the info message I get it seems like cluster issue although I am not sure enough.

Following is the code the error I am getting:

Error screen shot of the info message that is shown

Code I am using

public void compressData(final String inputFilePath,final String outputPath) throws DataFabricAppendException {
    CompressionOutputStream compressionOutputStream = null;
    FSDataOutputStream fsDataOutputStream = null;
    FSDataInputStream fsDataInputStream = null;
    CompressionCodec compressionCodec = null;
    CompressionCodecFactory compressionCodecFactory = null;
    try {
        compressionCodecFactory = new CompressionCodecFactory(conf);
        final Path compressionFilePath = new Path(outputPath);
        fsDataOutputStream = fs.create(compressionFilePath);

        compressionCodec = compressionCodecFactory
                .getCodecByClassName(BZip2Codec.class.getName());
        compressionOutputStream = compressionCodec
                .createOutputStream(fsDataOutputStream);

        fsDataInputStream = new FSDataInputStream(fs.open(new Path(
                inputFilePath)));

        IOUtils.copyBytes(fsDataInputStream, compressionOutputStream, conf,
                false);

        compressionOutputStream.finish();
    } catch (IOException ex) {
        throw new DataFabricAppendException(
                "Error while compressing non-partitioned file : "
                        + inputFilePath, ex);
    } catch (Exception ex) {
        throw new DataFabricAppendException(
                "Error while compressing non-partitioned file : "
                        + inputFilePath, ex);
    } finally {
        try {
            if (compressionOutputStream != null) {
                compressionOutputStream.close();
            }
            if (fsDataInputStream != null) {
                fsDataInputStream.close();
            }
            if (fsDataOutputStream != null) {
                fsDataOutputStream.close();
            }
        } catch (IOException e1) {
            LOG.warn("Could not close necessary objects");
        }
    }
}
Cœur
  • 37,241
  • 25
  • 195
  • 267
Binary01
  • 695
  • 5
  • 11
  • Have a look to this thread http://stackoverflow.com/questions/7153087/hadoop-compress-file-in-hdfs – Vikas Hardia Feb 14 '14 at 05:11
  • @VikasHardia I am aware of the sequence file compression but in this case my intention is to not use sequence file. If you check my screenshot, that is the problem that I am having. – Binary01 Feb 14 '14 at 08:16

0 Answers0