2

I'm new to hdfs/hadoop and need to know how to compress a file that is in a hdfs dir like hdfs://sandbox:8020/some/path.

I have tried

      Path p = new Path("/my/path/test1.gz");
      FSDataOutputStream os = fs.create(p);

      GZIPOutputStream gzipOs = new GZIPOutputStream(new BufferedOutputStream(os));

      Path filePath = file.getPath();
      FSDataInputStream is = fs.open(filePath);

      System.out.println("Writing gzip");

      byte[] buffer = new byte[1024];
      int len;
      while((len= is.read(buffer)) != -1){
        gzipOs.write(buffer, 0, len);
      }
      //close resources
      is.close();
      gzipOs.close();

But it doesn't work.

Any suggestions? Thanks in advance.

user3403657
  • 137
  • 1
  • 4
  • 14
  • Can you be more specific than "it doesn't work"? – Mike Park May 12 '14 at 21:43
  • @S.M.AlMamun I am not using mapreduce, just trying to compress files on hdfs. – user3403657 May 13 '14 at 01:19
  • @climbage Sure. I'm trying to archive files on hdfs. I have a file called test.doc that I'm trying to archive to "archive.bz2". I can create "archive.bz2" but when I open it in 7zip it contains "archive". If I extract it, and rename it to "test.doc" then its fine. How can I create archives on hdfs? I will need to eventually tar gzip dirs but am just trying to get something to work! – user3403657 May 13 '14 at 01:22
  • You need to call it `test.doc.bz2` so when you extract it it becomes `test.doc` – Mike Park May 13 '14 at 02:42
  • @climbage Yes, thanks, I tried that previously and it works. But now I'm wondering about creating a .tar.gz file. Is it possible to create .tar.gz on hdfs? – user3403657 May 13 '14 at 03:26
  • It's not duplicate. The former question is about `cli` and this one is about `Java API`. – Krzysztof Atłasik Sep 19 '19 at 14:07

1 Answers1

1

Below code is from Tom White's Definitive guide.

public class StreamCompressor {
  public static void main(String[] args) throws Exception {

  String codecClassname = args[0];
  Class<?> codecClass = Class.forName(codecClassname);
  Configuration conf = new Configuration();
  CompressionCodec codec = (CompressionCodec)
  ReflectionUtils.newInstance(codecClass, conf);
  CompressionOutputStream out = codec.createOutputStream(System.out);
  IOUtils.copyBytes(System.in, out, 4096, false);
  out.finish();
 }
}
Jerry Ragland
  • 611
  • 10
  • 17
  • Thanks for the code snippet but this doesn't work for me either. The archive get's created but the content is not named appropriately. I have a file on hdfs called test.doc and I am trying to compress it. I am able to create an archive.bz2 file, but when I open it, it contains "archive". If I rename it to "test.doc" then its as expected. Why can't I create an archive containing the file I want to compress with the filename? – user3403657 May 13 '14 at 01:18