Background
I need to parse some zip files of various types (getting some inner files content for one purpose or another, including getting their names).
Some of the files are not reachable via file-path, as Android has Uri to reach them, and as sometimes the zip file is inside another zip file. With the push to use SAF, it's even less possible to use file-path in some cases.
For this, we have 2 main ways to handle: ZipFile class and ZipInputStream class.
The problem
When we have a file-path, ZipFile is a perfect solution. It's also very efficient in terms of speed.
However, for the rest of the cases, ZipInputStream could reach issues, such as this one, which has a problematic zip file, and cause this exception:
java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:321)
at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:124)
What I've tried
The only always-working solution would be to copy the file to somewhere else, where you could parse it using ZipFile, but this is inefficient and requires you to have free storage, as well as remove the file when you are done with it.
So, what I've found is that Apache has a nice, pure Java library (here) to parse Zip files, and for some reason its InputStream solution (called "ZipArchiveInputStream") seem even more efficient than the native ZipInputStream class.
As opposed to what we have in the native framework, the library offers a bit more flexibility. I could, for example, load the entire zip file into bytes array, and let the library handle it as usual, and this works even for the problematic Zip files I've mentioned:
org.apache.commons.compress.archivers.zip.ZipFile(SeekableInMemoryByteChannel(byteArray)).use { zipFile ->
for (entry in zipFile.entries) {
val name = entry.name
... // use the zipFile like you do with native framework
gradle dependency:
// http://commons.apache.org/proper/commons-compress/ https://mvnrepository.com/artifact/org.apache.commons/commons-compress
implementation 'org.apache.commons:commons-compress:1.20'
Sadly, this isn't always possible, because it depends on having the heap memory hold the entire zip file, and on Android it gets even more limited, because the heap size could be relatively small (heap could be 100MB while the file is 200MB). As opposed to a PC which can have a huge heap memory being set, for Android it's not flexible at all.
So, I searched for a solution that has JNI instead, to have the entire ZIP file loaded into byte array there, not going to the heap (at least not entirely). This could be a nicer workaround because if the ZIP could be fit in the device's RAM instead of the heap, it could prevent me from reaching OOM while also not needing to have an extra file.
I've found this library called "larray" which seems promising , but sadly when I tried using it, it crashed, because its requirements include having a full JVM, meaning not suitable for Android.
EDIT: seeing that I can't find any library and any built-in class, I tried to use JNI myself. Sadly I'm very rusty with it, and I looked at an old repository I've made a long time ago to perform some operations on Bitmaps (here). This is what I came up with :
native-lib.cpp
#include <jni.h>
#include <android/log.h>
#include <cstdio>
#include <android/bitmap.h>
#include <cstring>
#include <unistd.h>
class JniBytesArray {
public:
uint32_t *_storedData;
JniBytesArray() {
_storedData = NULL;
}
};
extern "C" {
JNIEXPORT jobject JNICALL Java_com_lb_myapplication_JniByteArrayHolder_allocate(
JNIEnv *env, jobject obj, jlong size) {
auto *jniBytesArray = new JniBytesArray();
auto *array = new uint32_t[size];
for (int i = 0; i < size; ++i)
array[i] = 0;
jniBytesArray->_storedData = array;
return env->NewDirectByteBuffer(jniBytesArray, 0);
}
}
JniByteArrayHolder.kt
class JniByteArrayHolder {
external fun allocate(size: Long): ByteBuffer
companion object {
init {
System.loadLibrary("native-lib")
}
}
}
class MainActivity : AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
thread {
printMemStats()
val jniByteArrayHolder = JniByteArrayHolder()
val byteBuffer = jniByteArrayHolder.allocate(1L * 1024L)
printMemStats()
}
}
fun printMemStats() {
val memoryInfo = ActivityManager.MemoryInfo()
(getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
val nativeHeapSize = memoryInfo.totalMem
val nativeHeapFreeSize = memoryInfo.availMem
val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
Log.d("AppLog", "total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
"free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
"used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)")
}
This doesn't seem right, because if I try to create a 1GB byte array using jniByteArrayHolder.allocate(1L * 1024L * 1024L * 1024L)
, it crashes without any exception or error logs.
The questions
Is it possible to use JNI for Apache's library, so that it will handle the ZIP file content which is contained within JNI's "world" ?
If so, how can I do it? Is there any sample of how to do it? Is there a class for it? Or do I have to implement it myself? If so, can you please show how it's done in JNI?
If it's not possible, what other way is there to do it? Maybe alternative to what Apache has?
For the solution of JNI, how come it doesn't work well ? How could I efficiently copy the bytes from the stream into the JNI byte array (my guess is that it will be via a buffer)?