0

I want to read the last n lines of a big txt file compressed in a zip file without unzipping it.

This is what I have now:

ZipFile zf = new ZipFile(file.getAbsolutePath());
Enumeration<?> entries = zf.entries();
ZipEntry ze = (ZipEntry) entries.nextElement();
BufferedReader in = new BufferedReader(new InputStreamReader(zf.getInputStream(ze)));

void readLastNLines(BufferedReader bf){
//some code here
}

I was thinking of the way using RandomAccessFile(File file, String mode) but it requires a File as the argument. Zip file cannot be treated like directory so I cannot pass it.

Any ideas?

Appreciate any assistance and inputs.

Thanks!

[EDIT] I figure out a less efficient way to achieve this:

Since RandomAccessFile cannot be used, I used the InputStream approach:

InputStream is = zf.getInputStream(ze);
int length = is.available();
byte[] bytes = new byte[length];
int ch = -1;
while ((ch = is.read()) != -1) {
  bytes[--length] = (byte) ch;
}

String line = new String(bytes);
//reverse the string
String newLine = new StringBuilder(line).reverse().toString();

//Select how many lines do you want(some number = number of bytes)
System.out.println(newLine.substring(line.length()-#some number#));
John Powel
  • 1,394
  • 3
  • 17
  • 30
  • 1
    This is a huge pain in the ass. Check out this thread http://stackoverflow.com/a/7322581/1417974 – Hans Z Jun 11 '12 at 21:49
  • 6
    you can't do random access on compressed stream contents. you either need to uncompress to a temp file or figure out a way to get what you need from one pass through the stream (e.g. read through the stream and keep the last N lines in memory, when you get to the end of the stream, you have the last N lines). – jtahlborn Jun 11 '12 at 22:18
  • 1
    @jtahlborn That should be an answer – Jim Garrison Jun 11 '12 at 22:42
  • Anybody has a more efficient way to do this? – John Powel Jun 13 '12 at 16:33

2 Answers2

1

you can't do random access on compressed stream contents. you either need to uncompress to a temp file or figure out a way to get what you need from one pass through the stream (e.g. read through the stream and keep the last N lines in memory, when you get to the end of the stream, you have the last N lines).

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • You might be able to make that a little faster if you have an upper bound on the number of bytes the last N lines could be. You could seek the InputStream to (uncompressedSize - maxBytesInLastNLines) before creating the InputStreamReader and BufferedReader. That saves you the cost of decoding a ton of UTF-8 text and allocating memory for it. You are just going to throw it away anyway. – John Watts Jun 12 '12 at 15:22
  • @JohnWatts - you can't (easily) seek to a valid character boundary in a multi-byte character encoding. once you found the point in the byte stream, you would have to find the end of the current character before you could start converting bytes to chars. – jtahlborn Jun 12 '12 at 17:30
  • You are right that it isn't trivial, but it is actually quite easy with UTF-8 because it is self-synchronizing. I must admit I didn't know that until I read your comment, so thanks for prompting me to look it up. See http://stackoverflow.com/questions/4935034/how-do-i-accomplish-random-reads-of-a-utf8-file. Also, I implicitly assumed the decompressed file was UTF-8 or ASCII. Other encodings might yield other approaches. – John Watts Jun 12 '12 at 17:53
  • thank you for the advice, I am trying to use the `InputStream` approach – John Powel Jun 13 '12 at 16:32
0

Compression like decryption and binary deserialization can only be done from the start. There are some forms of compression where you could do this but only simplest forms. (Zip and Jar are not examples of these) This is because you don't know what the bytes mean unless you read some, often all, the bytes before them.

If you want to access portions of a "file" which is compression, you need to break it up into smaller portions which can be decompressed individually.

Rich
  • 15,602
  • 15
  • 79
  • 126
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130