0

I'm wondering if there is a way to delete the already decompressed portion of a file as it is being decompressed. I've got an external backup of the compressed file so I'm not worried about loosing data. The file is a bz2. I'm looking to do this because I've only got 50 GB available on the drive and the compressed file is 33 GB. If I can't delete portions of the file while extracting then there won't be enough space for the decompressed file.

There are other things I could do to get around this but I am interested to know if what I mentioned above is possible.

keyneom
  • 815
  • 10
  • 12
  • Are you writing a program that does this? If so, what language is it in, and what BZip2 library are you using? If not, then -- this doesn't seem like a programming question, and it probably belongs at [Super User](http://superuser.com/) instead. – ruakh Dec 31 '12 at 18:16
  • I suppose my question was whether you knew of how I could achieve the desired result. I would be willing to use bash, python, or php to get the job done (they are on the server already) so I would be willing to write a program if you know of a language and library capable of this. I'm not positive whether or not Super User would be a better place to ask this but I'll ask there if you feel it fits better. – keyneom Dec 31 '12 at 18:35
  • See http://stackoverflow.com/questions/9995093/how-to-make-holes-in-file-to-rease-data-by-c-in-linux. After you read a block from the compressed file, use that technique to replace the block with a hole. – Barmar Dec 31 '12 at 18:57
  • Actually, wait a sec, I'm confused. If the compressed file is 33GB, then the *uncompressed* file is probably too big to fit in your 50GB of available space, no? – ruakh Dec 31 '12 at 20:18
  • Nope, approximately 42GB. – keyneom Dec 31 '12 at 20:28

3 Answers3

3

In general, it is not possible to delete the initial portion of a file - you can only truncate a trailing portion of it.

Newer Linux kernels, however, support punching holes into files for specific filesystems, using the fallocate() system call. There is a corresponding utility that can be used for the same purpose, although you need a relatively recent version (2.21 or later) of the util-linux package for hole punching support to be included in that utility.

Keep in mind that hole punching is still relatively new and kernel bugs still pop-up - you might be better off just cleaning up your filesystem to free some space.

thkala
  • 84,049
  • 23
  • 157
  • 201
  • This looks like it is the only answer that could accomplish what I am looking for but taking into account the things you mentioned, even if my file-system did support this (it doesn't), I probably wouldn't use it. I think I will just remove the compressed file and then transfer the file decompressed over an sftp connection--takes days OTL. Thanks all for the responses! – keyneom Dec 31 '12 at 20:39
2

If I understand you right, you want to delete the portions at the beginning of a compressed file once they have been read, decompressed and written.

This is generally impossible since under Unix there is no way to delete an initial part of a file without rewriting the rest of it (it is possible to truncate a file from the end without rewriting but that does not solve the problem at hand). File systems with the concept of holes may be an option, though.

However, maybe it is possible for you to create smaller compressed files, like 33 1GB zipped files. Then it is easy to remove the files you have uncompressed already.

Jens
  • 69,818
  • 15
  • 125
  • 179
  • The idea of using smaller files to begin with is smart. I think tar can actually do that so that you can store backups on multiple tapes. Additionally, the split command doesn't remove the original file until all the parts have been created (so I can't split the file now). Unfortunately the beastly 33GB file is what I was given to work with. – keyneom Dec 31 '12 at 20:34
  • Plan B: Invest 50 bucks in a 500+GB disk and never worry again :-) – Jens Dec 31 '12 at 21:22
1

The most obvious solution is to write a filter which handles the decompressed output looking for whatever you need in the output.

bunzip2 -c compressedfile.bz2 | yourfilterprogram

-c directs bunzip2 to decompress to stdout.

Using this technique, the uncompressed file is not stored on disk at all.

Phil Frost
  • 3,668
  • 21
  • 29
wallyk
  • 56,922
  • 16
  • 83
  • 148
  • don't you need the `-c ` option to send the output to stdout, rather than write the (un)compresedfile? Good luck to all. – shellter Dec 31 '12 at 18:43
  • @shellter: `-c` is not mentioned in the man page for *bunzip2*, only for *bzip2*. Anyway, `bzcat` is probably a better choice for this use. – wallyk Dec 31 '12 at 18:46
  • Not sure I am understanding properly. I want all of the contents of the file decompressed. I want to eliminate the parts of the compressed file that have already been decompressed. Sounds like this decompresses and then returns just a portion of the compressed file. – keyneom Dec 31 '12 at 20:30