1

I am trying to invoke a native library from within a flink pipeline.

Environment is EMR 5.34 Flink 1.13.1

I have built the uber fat jar and made sure the .so file is available in the JAR file. However I am facing the below exception when starting up the flink application. Appreciate any pointers.

Caused by: java.lang.UnsatisfiedLinkError: no <<my native library artifact name>> in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
    at java.lang.Runtime.loadLibrary0(Runtime.java:871)

Thank you, Amit

Amit
  • 1,111
  • 1
  • 8
  • 14
  • 1
    Does this answer your question? [Extract and load DLL from JAR](https://stackoverflow.com/questions/4764347/extract-and-load-dll-from-jar) – Botje Jul 13 '22 at 21:06
  • Appreciate your comment. However for Flink and Spark workload, we usually build fat jars with all the dependencies. Similar Fat jar works for us in Spark, however flink may have or need some other tweaks hence I am trying to ask it through this question. – Amit Jul 14 '22 at 12:39
  • 1
    We successfully used JFastText as part of a Flink workflow running on EMR. There were issues with configuring EMR & Flink to properly account for the native memory used by JFastText, but otherwise it worked fine. Maybe check https://github.com/vinhkhuc/JFastText to see how they package up the native code? – kkrugler Jul 14 '22 at 16:36
  • thanks for the info. I will check out the pom but is this a JNI library? I have a `nar` dependency in my pom and that is where I am facing the above issue. It says <> not in library path..However the FAT jar has the nar file and the `.so` file as well. – Amit Jul 14 '22 at 17:29
  • Are you aware that the library lookup path is completely distinct from the classpath? Just putting .so files in your jars will not make Java find them. The common workaround is extracting the .so to the filesystem and pointing `java.library.path` (the *actual* native lookup path) there. Which is what my duplicate also does. – Botje Jul 15 '22 at 07:49
  • yeah. you are right. I am trying to locate an option/configuration that I can use from command line while starting the flink application. I have extracted the file manually for now and also trying to find out how I can pass java.library.path when I am starting the flink application on EMR cluster. – Amit Jul 15 '22 at 13:16

1 Answers1

1

I was able to resolve this at least in "Session" mode by setting below config parameters in flink-conf.yaml file.

env.java.opts: "-Djava.library.path=<<path to libraries>>"
containerized.master.env.LD_LIBRARY_PATH: "<<path to libraries>>"
containerized.taskmanager.env.LD_LIBRARY_PATH: "<<path to libraries>>"

You also need to use StreamExecutionEnvironment.registerCachedFile to pass the extracted files on the JobManager to the TaskManagers involved.

On Driver side -

StreamExecutionEnvironment.getExecutionEnvironment.registerCachedFile(directorywherefilesareextracted,"somekey")

Hope this helps if someone is looking for an approach that could be used to work with such scenario. You can access these cached files and store them in the directory configured in filnk-conf.yaml so that they are included in the library path for execution.

getRuntimeContext().getDistributedCache().getFile("somekey")

To be able to access the RuntimeContext, you need to extend RichMapFunction.

Update:

With all the above changes, when I run the Flink pipeline for the first time, it still complains about library not found. I did check the directory in which I am extracting distributed cache and the libraries are there. Subsequent runs after the first failure are successful. I am not sure why I am seeing this kind of behavior.

Update: Made sure that the directory, where we extract the libraries, is readily available when we create EMR cluster and it worked like a charm. I created this directory by configuring Bootstrap action.

Amit
  • 1,111
  • 1
  • 8
  • 14