I am reading 7z and zip files in Scala. The way I am doing it is by reading bytes in the file as follows
val zipInputStream = new ZipInputStream(new FileInputStream(file));
var arrayBufferValues = ArrayBuffer[String]();
val buffer = new Array[Byte](1024);
var readData:Int = 0;
while({entry = zipInputStream.getNextEntry; entry != null}) {
while({readData = archiveFile.read(buffer); readData != -1}) {
content7zStream.write(buffer, 0, readData);
//println(contentBytes.toString());
arrayBufferValues += content7zStream.toString("UTF-8");
println(arrayBufferValues.mkString)
}
println("Done with processing file ====>>>>> " + Paths.get(file).getFileName + " ---- " + entry.getName);
parseFilesMap.put(Paths.get(file).getFileName + "^" + entry.getName, arrayBufferValues)
arrayBufferValues.clear();
content7zStream.close();
}
However, I am seeing a lot of performance issues when there are multiple csv files (say about 20 MB) inside the 7z file.
It takes hours to process and the process still doesn't seem to complete. Sometimes I receive OutOfMemory exception.
Is there a better way to do it or am I missing something here?
Thanks!