I have a tar.gz file with a huge amount of small xml-files (slightly less than 1.5m)(no subdirectories). Now I want to iterate through those and I am trying to use apache commons compress to achieve that. I don't want to output or write anything to a new file as is often seen in similar topics. I just want to incrementally read the information (perfect would be to be able to stop at one point and continue on another run of the programm but that's secondary).
SO for starters I thought I should start small with something like that (the counter just exists for testing purposes to reduce time):
public static void readTar(String in) throws IOException {
try (TarArchiveInputStream tarArchiveInputStream =
new TarArchiveInputStream(
new BufferedInputStream(
new GzipCompressorInputStream(
new FileInputStream(in))))){
TarArchiveEntry entry;
int counter = 0;
while ((entry = tarArchiveInputStream.getNextTarEntry()) != null && counter < 1000) {
counter++;
System.out.println(entry.getFile());
}
}
}
But the the result of entry.getFile() is always null, so I cannot work with its content, while entry.getName() returns the expected result.
I would be glad if someone could point out my mistake.