4

I have developed 3 C/RS/Neon-Intrinsics versions of Video Processing Algorithm using Android NDK (using C++ APIs for Renderscript). Calls to C/RS/Neon will be made to Native level on NDK side from JAVA front end. I found that for some reason Neon version consumes lot of power in comparison with C and RS versions. I used Trepn 5.0 for my power testing.

  1. Can some one clarify me regarding the power consumption level for each of these methods C , Renderscript - GPU, Neon Intrinsics. Which one consumes most ?

  2. What would be the Ideal power consumption level for RS codes ?, since GPU runs with less clock frequency and power consumption must be less!

  3. Does Renderscript APIs focuses on power optimization ?

Video - 1920x1080 (20 frames)

  1. C -- 11115.067 ms (0.80mW)
  2. RS -- 9867.170 ms (0.43mW)
  3. Neon Intrinsic -- 9160 ms (1.49mW)
Kaliuday
  • 110
  • 8
  • 4
    Your NEON version should be much faster if it's properly implemented - the total *energy* consumption should therefore not be greatly impacted, e.g. twice the power consumption for half the time should have the same impact on battery power consumption, since the total energy consumption is the same. It looks like your NEON implementation needs some optimisation work though, since it's not much faster than your C code ? – Paul R Jun 27 '14 at 14:08
  • Its implemented using neon intrinsics i ve not done assembly coding. Comparitvely does neon consumes more power than RS? – Kaliuday Jun 27 '14 at 16:47
  • 2
    There must be something terribly wrong with your NEON codes. I'd check the disassembly. Either it's the improper implementation or the compiler messing up. Maybe both. – Jake 'Alquimista' LEE Jun 29 '14 at 02:39
  • The thing with NEON is usually a hit-or-miss case. Either it's not NEON-able to start with, or it's a magnitude faster than the CPU version. There is nothing like "a little bit faster" just like your results. You must have done something wrong with NEON. – Jake 'Alquimista' LEE Jun 29 '14 at 02:45
  • @ Jake I am implementing edge detection algorithm which does a convolution operation, my C version does it using 2 for loops within those forloops my code calculates Xgradient and Ygradient (using 2 forloops to process 3x3 window). I used Neon intrinsics to parallelize X and Y gradient operations. – Kaliuday Jun 29 '14 at 04:44
  • It has been reported that due to compiler limitations, NEON intrinsics do not generate good code, when compared with assemblies. You will need to take a look at the disassembly of the generated code to determine if this is happening. – rwong Jul 07 '14 at 05:53

1 Answers1

6

First, Power consumption of render script code is dependent on the type of SOC, the frequency/Voltages at which the CPUs, GPUs operate etc.

Even if you look at CPUS from the same vendor, say ARM for instance A15s and A9s, A15s CPUs are more power hungry compared to the A9. Similarly, A Mali GPU4XX versus 6XX also exhibits power consumption differences for the same task. In addition, power deltas also exist between different vendors, for instance, Intel and ARM CPUs, for the doing the same task. Similarly, one would notice power differences between a QCOM Adreno GPU and say ARM Mali GPU, even if they are operating at the same frequency/voltage levels.

If you use a Nexus 5, we got a QUAD A15 CPU cranking at 2.3G speed per CPU. Renderscript pushes CPUs and GPUs to their highest clock speed. So on this device, I would expect the power consumption of RS code based on CPU/Neon or just CPU to be highest depending on the type of operations you are doing and then followed by the RS GPU code. So bottomline, on power consumption, the type of device you are using matters a lot due to the differences in SOCs they use. In the latest generation of SOCs that are out there, I expect CPUs/Neon to be more power hungry then GPU.

RS will push the CPU/GPU clock frequency to the highest possible speed. So I am not sure if one could do meaningful power optimizations here. Even if they do, those power savings will be miniscule compared to the power consumed by CPUS/GPU at their top speed.

This power consumption is such a huge problem on mobile devices, you would probably be fine from power consumption angle with your filters for processing a few frames in computational imaging space. But the moment one does renderscript in real video processing, the device gets heated up so quickly even for lower video resolutions, and then the OS system thermal managers come into play. These thermal managers reduce the overall CPU speeds, causing unreliable performance with CPU renderscript.

Responses to comments

Frequency alone is not the cause of power consumption. It is the combination of frequency and voltage. For instance, GPU running at say 200 Mhz at 1.25V, and 550 Mhz at 1.25V will likely consume the same power. Depending on how power domains are designed in the system, something like 0.9V should be enough for 200Mhz and system should in theory transision GPU power domain to a lower voltage when frequency comes down. But various SOCs have various issues so one cannot guarantee a consistent voltage and frequency transition. This could be one reason behind why GPU power could be high even for nominal loads.

So for whatever, complexities, if you are holding GPU Voltage at say something like 1.25V@600 MHz, you power consumption will be pretty high and comparable to to that of CPUs cranking at 2G@1.25V...

I tested Neon intrinsic - 5X5 convolve and they are pretty fast (3x-5x) compared to not using CPUs for the same task. Neon hardware is usually in the same power domain as CPUs (aka MPU power domain). So all CPUs are held at the voltage/frequency even when Neon hardware is working. Since Neon performs faster for the given task than CPU, I wouldn't be surprised if it consumes more power relatively than the CPU for that task. Something has to give if you are getting faster performance - it is obviously power.

  • @ mahesh Thanks for your detailed answer, I am using Nexus 5 for my testing. So you mean to say that CPU and Neon versions are much power hungrier than GPU version. I strongly feel that GPU runs at less clock frequency than CPU so it must consume less power presume that runtime environment of RS ports our filters onto GPU. I am not sure why device gets heated up when it uses RS-GPU? is it not contradicting with your previous statement? I have bit less info about Neon Intrinsics performance, so I would like to know how does Neon affects power consumption, My exp shows that Neon - 1.49mW highest – Kaliuday Jun 29 '14 at 04:54