I have a file 5GB in size which I want to read by chunks, say 2MB. Using java.io.InputStream
works fine. So I measured this thing as follows:
static final byte[] buffer = new byte[2 * 1024 * 1024];
public static void main(String args[]) throws IOException {
while(true){
InputStream is = new FileInputStream("/tmp/log_test.log");
long bytesRead = 0;
int readCurrent;
long start = System.nanoTime();
while((readCurrent = is.read(buffer)) > 0){
bytesRead += readCurrent;
}
long end = System.nanoTime();
System.out.println(
"Bytes read = " + bytesRead + ". Time elapsed = " + (end - start)
);
}
}
RESULT = 2121714428
It can be seen that averagely it takes 2121714428 nanos. It is so because the implementation does (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
of the data read into a malloc
ed or stack allocated buffer as shown here. So memcpy
takes pretty large amount of CPU time:
Since JNI spec defines that
Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.)
I don't see any problems to do read from a regular file within a critical section. Reading from a regular file is blocked only briefly and does not depend on any java thread. Something like this:
static final byte[] buffer = new byte[2 * 1024 * 1024];
public static void main(String args[]) throws IOException {
while (true) {
int fd = open("/tmp/log_test.log");
long bytesRead = 0;
int readCurrent;
long start = System.nanoTime();
while ((readCurrent = read(fd, buffer)) > 0) {
bytesRead += readCurrent;
}
long end = System.nanoTime();
System.out.println("Bytes read = " + bytesRead + ". Time elapsed = " + (end - start));
}
}
private static native int open(String path);
private static native int read(int fd, byte[] buf);
JNI functions:
JNIEXPORT jint JNICALL Java_com_test_Main_open
(JNIEnv *env, jclass jc, jstring path){
const char *native_path = (*env)->GetStringUTFChars(env, path, NULL);
int fd = open(native_path, O_RDONLY);
(*env)->ReleaseStringUTFChars(env, path, native_path);
return fd;
}
JNIEXPORT jint JNICALL Java_com_test_Main_read
(JNIEnv *env, jclass jc, jint fd, jbyteArray arr){
size_t java_array_size = (size_t) (*env)->GetArrayLength(env, arr);
void *buf = (*env)->GetPrimitiveArrayCritical(env, arr, NULL);
ssize_t bytes_read = read(fd, buf, java_array_size);
(*env)->ReleasePrimitiveArrayCritical(env, arr, buf, 0);
return (jint) bytes_read;
}
RESULT = 1179852225
Runnning this in a loop it takes averagely 1179852225 nanos which is almost twice more efficient.
Question: What's the actual problem with reading from a regular file within critical section?