1

I'm looking for the best way to develop and package different variants of a library with different compile settings but for the same ABI and then selecting the best fit at runtime. In more concrete terms, I'd like a NEON and non-NEON armeabi-v7a build.

The native library has a public C interface that third parties link to. They seem to need to link to one of the variants to prevent link errors, but I'd like to load the alternative variant at runtime if it's a better fit for the device, and have the runtime loader do the correct relocations.

From what I see so far it seems I need to give both variants the same file name, so need to put them in different folders. Subfolders under the abi folder don't seem to get copied by the package installation process so that approach doesn't work. The best suggestion I've seen so far is to manually copy one variant from the res folder to a known device path and to call System.loadLibrary() with a full path. Reference: https://groups.google.com/forum/#!topic/android-ndk/zu_dmcmUlMo

  1. Is this still the best/recommended approach?
  2. How will this interact with the binary translation done on non-arm devices? (Although I can supply an x86 build, some third parties may leave it out of their apk).

I'm assuming cpufeatures on a device using binary translation will not report the cpu family as ARM, so my proposed solution would be to build a standard armeabi-v7a library in the normal way (which I guess will get binary translated), and ship a NEON-supporting library in res/raw. Then at runtime if cpufeatures reports an ARM CPU with NEON support then copy out that library and call loadLibrary with the full path. Can anyone see any problems with that approach?

tangobravo
  • 898
  • 10
  • 24

1 Answers1

1

If you explicitly want to have two different builds of a lib, then yes, it's probably the best compromise.

First off - do note that many libraries that can use NEON can be built with those parts runtime-enabled so that you can have a normal ARMv7 build which doesn't strictly require NEON but can enable those codepaths at runtime if detected - e.g. libav/FFmpeg do that, and the same goes for many other similar libraries. This allows you to have one single ARMv7 binary that fully utilizes NEON where applicable, while still works on the few ARMv7 devices without NEON.

If you're trying to use compiler autovectorization, or if this is a library where the NEON routines aren't easily confined to restricted parts that are enabled at runtime (or hoping to gain extra performance by building the whole library with NEON enabled), your approach sounds sane.

Keep in mind that you want to have at least one native library that is packaged "normally" (which you seem to have, but which has been an issue in e.g. https://stackoverflow.com/a/29329413/3115956). On installation, the installer picks the best match of the bundled architectures and only extracts the libs from that one, and runs the process in that mode. On devices with multiple ABIs (32 and 64 bit), this is essential since if the process is started in a different mode it's too late to switch mode once you try to load a library in a different form.

On an x86 device that emulates ARM binaries, at least the cpufeatures library will return ARM if the process is running in ARM mode. If you use system properties to find the primary and secondary ABIs, you won't know which of them the current process is using though.

EDIT: x86 devices with binary translation actually seem to be able to load an armeabi library even if the same process already has loaded some bundled x86 libraries as well. So apparently this translation is done on a per library basis, not like 32 vs 64 bit, where a certain mode is chosen for the process at startup, which excludes loading any libraries of the other variant.

Community
  • 1
  • 1
mstorsjo
  • 12,983
  • 2
  • 39
  • 62
  • Thanks. Are you sure cpufeatures returns ARM on x86 devices? I thought they did a binary translation to x86 at install time rather than actual ARM emulation? I need to get an x86 device to test at my end really. – tangobravo Apr 08 '15 at 08:14
  • Although I will try to move to runtime checks in general, I assume every call needs to check for neon in the general (non-NEON compiled) code and then call into a function from a separate file compiled with NEON enabled. That's pretty messy and stops NEON code from being inlined. – tangobravo Apr 08 '15 at 08:35
  • Well, the cpufeatures library only returns the architecture it was compiled for (have a look at `android_cpuInitFamily` in `/sources/android/cpufeatures/cpu-features.c`), so if it originally was an ARM binary, it will always report ARM. – mstorsjo Apr 08 '15 at 09:30
  • I'm pretty sure the binary translation happens at runtime, not at install time - but it seems it is more powerful than I thought. I tried doing an APK that contains both arm and x86 libraries, but having an arm-only library in res/raw. Since there's an x86 version, it will use that one, but the arm library from res/raw can also be loaded just fine into the same process, so apparently the translation is per library. This is contrary to e.g. armeabi vs arm64-v8a, where you can't load a 32 bit library if the process was started in 64 bit mode. I'll edit the post and clarify this. – mstorsjo Apr 08 '15 at 09:32
  • And yes, you can't have inlined NEON code if you have runtime checks. Many libraries (e.g. codecs and similar code) mostly have NEON code in DSP routines (e.g. "do convolution over this array"), and have setup code that determines the right version of such functions to use, and you then only use them via function pointers. You may miss out on some potential inlined use somewhere, but you still get the main bulk benefit from SIMD. Most codec libraries I've seen lately at least do things this way. – mstorsjo Apr 08 '15 at 09:34
  • Yup, there's some definitely a few chunky functions that would still make sense with a runtime check. Things like matrix4 * vector4 have really nice neon implementations (and autovectorize pretty well) but it's annoying they can't be inlined. I'll need to do some benchmarks to see how significant the function call overhead is in those cases. I've also got some NEON implementations in member functions of classes, which will be a bit of a pain to extract to a separate file. All to support the tiny fraction of Tegra 2 devices out there. Ah well. – tangobravo Apr 08 '15 at 09:39
  • Ok, for repeated matrix/vector operations, it probably makes a lot of sense to have such things inlined, although calls via function pointers probably isn't prohibitively expensive either. Yeah, for ARMv7 without NEON, it sure is a bit annoying, but once you get it set up for runtime detection, you can use it for other cases (e.g. different variants of SSE on x86) as well. – mstorsjo Apr 08 '15 at 09:41