I'm working on an iPhone app that involves certain physics calculations that are done thousands of times per second. I am working on optimizing the code to improve the framerate. One of the pieces that I am looking at improving is the inverse square root. Right now, I am using the Quake 3 fast inverse square root method. After doing some research, however, I heard that there is a faster way by using the NEON instruction set. I am unfamiliar with inline assembly and cannot figure out how to use NEON. I tried implementing the math-neon library but I get compiler errors because most of the NEON-based functions lack return
.
EDIT: I've suddenly been getting some "unclear question" close votes. Although I think its quite clear and those who answered obviously understood, maybe some people need it stated explicitly: How do you use Neon to perform faster calculations? And is it really the fastest method for getting the inverse square root on the iPhone?
EDIT: I did some more formal testing on Neon VS Quake today, but If anything, I'm even more uncertain about the outcome now:
In-App Testing: (An app that is currently in the app store with its invsqrt method modified)
- Quake Method (leading by a marginal increase in average FPS under stressful conditions)
- Neon (It was a really close call but it seemed that Quake was slightly faster)
- 1/sqrtf() (a bit more noticeable difference, 1-3 FPS drop).
"Formal" Testing (An app that devours my Phone's CPU. Times how long it takes each method to get through an array of 10000000 randomly generated floats)
- Neon (clearly the fastest, and double the speed if it is used to do two sqrts at once).
- 1/sqrtf() (Only marginally slower than Neon. This surprising result leads me to deem this test "inconclusive" until I investigate further)
- Quake (This method, surprisingly, was a few orders of magnitude slower than the other two methods. This is especially surprising given its performance in the other test.)
While quake vs neon was too close to say anything for sure in the app performance test, the quake vs 1/sqrtf() was quite clearly cut out in the first test, and the second test was extremely consistent with the values it outputted. What is important in the end, though, is app performance, so I'm going to make my final decision based on that test.