Can assembly make program written in C faster?

Question

I am writing program in C language and need to make it significantly faster as this is assessment for performance. So I am curious if Assembly code can make C program faster at any level? Is it possible to make runtime of program shorter if I replace some fractions of C code by assembly? (for example huge for loops).\

Thanks.

Possibly, although optimizing compilers are pretty smart these days. You probably can't beat them. Optimize your algorithms first. A quicksort written in C will vastly out-perform a bubble-sort written in ASM. — Blorgbeard, Aug 12 '14 at 04:02
Yes, but you might also make it slower as you take away the compiler's ability to perform optimizations. Who's better writing optimized assembly, you or the compiler? Optimize your algorithm instead. It's impractical to give generalized advice on C vs ASM. — Casper Beyer, Aug 12 '14 at 04:03
Define the nature of your problem in concrete terms. Your question is otherwise too abstract to be answerable on SO. — Alex Reynolds, Aug 12 '14 at 04:04
Maybe you should tell us more about the code you are trying to optimize... Is it an algorithm which could be broken into parallel components? — Ron, Aug 12 '14 at 04:25
Usually improving the algorithm is the most rewarding approach. Hoping for magic gains with asm usually ends in disappointment. Compilers are quite good at their jobs. — David Heffernan, Aug 12 '14 at 07:20
Basically a duplicate of [Is inline assembly language slower than native C++ code?](https://stackoverflow.com/q/9601427) / [When is assembly faster than C?](https://stackoverflow.com/q/577554) — Peter Cordes, Oct 23 '21 at 18:12

score 5 · Answer 1 · 2014-08-12T04:33:55.833

5

Replacing C code with assembly can make your code faster if and only if one or more of the following is true:

Your compiler is generating terrible code.
You have forgotten to enable optimizations.
The C code you are replacing was unusually inefficient.
The assembly you are writing is making use of CPU features that the compiler cannot, such as vector operations or task-specific primitives like crypto acceleration. (Note that some modern compilers can automatically vectorize code, too, albeit not always very well.)

If none of the preceding conditions is true, you'll be wasting your time.

edited Aug 12 '14 at 04:33

answered Aug 12 '14 at 04:04

+1, totally true. I'd say it's pretty useless to code in assembly these days. Compilers are far better at doing it. – Filipe Gonçalves Aug 12 '14 at 04:05
Not to mention that it's difficult to tell when the compiler is actually generating terrible code. – Mysticial Aug 12 '14 at 04:06
@FilipeGonçalves I suppose it's still somewhat useful if you have ridiculously tight power or memory constraints. – Cubic Aug 12 '14 at 04:25
@Cubic Correct. But note that I only said that these are conditions where using assembly will make your code *faster* — size constraints are another issue entirely. – Aug 12 '14 at 04:31

score 2 · Answer 2 · answered Aug 12 '14 at 16:52

Can recasting stuff in assembler make your program faster ? Yes. Significantly faster ? That depends on where the bottle-neck is.

With your modern processor, saving instructions does not necessarily save processing time. Scheduling operations to make best use of overlapping execution may do better, even if more instructions are involved. The rules are complicated and not (in my experience) well documented, and vary from processor to processor... and are probably better suited to machine generation of instructions than your human programmer. The processor is built to run machine generated code ! Cleaner, hand-crafted code may look prettier, but may not run any faster :-(

For small fragments of critical code, a human can be better at making use of special purpose instructions in ways particularly well suited to the special needs of the task. A human can also do better where they can take advantage of special properties of the problem. And in assembler the human may be able to push even general purpose instructions to get more out of them. Working with the branch predictor can help, and the human can know more about what the code is going to do, mare than the compiler can deduce from what's written. Similarly, the human may do better at dropping hints to the cache management for pre-reads etc. In short, the human can (still) do better in specialised areas where general purpose code generation cannot be expected to produce the best result.

In larger pieces of code, the human may do better by not being bound by the ABI. The human may be able to allocate key infomation to registers across many functions, and have some functions take their arguments and return results in ways which are convenient for the callers, and which don't require shuffling things around all the time between calls. Also, the human may be better at allocating stuff in memory to help the cache, given a better global view of the problem. In short, the human can (still) do better armed with a wider view of the problem.

However, none of this is going to come cheap ! And it may be necessary to try more than one approach to hand-optimising the code, and some careful measurement to ensure it is indeed better.

Of course, this is all assuming you are writing for a "big" processor -- which you didn't specify. If you are writing for an itty-bitty PIC (say), older rules apply.

And, of course, the oldest rules of all when it comes to code optimisation:

don't do it: find a better algorithm
don't do it: find a better data structure
don't do it: repeat (1) and (2)
don't do it... unless you have a piece of code which is critical to the running time... and even then only optimise the bit(s) that matter.

Believe me, as an assembler programmer it pains me to say these things ! But you need a particular sort of problem to make it worth the time and effort involved in carefully crafting effective assembler code.

score 1 · Answer 3 · answered Aug 12 '14 at 04:06

Simply rewriting C code in assembler is extremely unlikely to speed things up. In fact, it has a better chance of slowing things down. Modern compilers are very good at generating assembly code that is as efficient as possible an expression of the algorithm coded in C. Just be sure to turn on the compiler's optimization option(s).

Performance gains are most likely obtainable by a variety of strategies that can be expressed in C:

loop unrolling
eliminating unnecessary calculations
pre-calculating values
replacing algorithms with more efficient ones
replacing data structures with more efficient ones
paying attention to locality of reference and reorganizing calculations to achieve it

Many other techniques can be applied as well, but without knowing more about your problem, it's impossible to say what might apply.

hyde · Answer 4 · 2014-08-12T05:14:51.850

A good compiler will do a good job at optimization, so even if you are an expert and willing to spend time finding the optimal assembly for some task, you normally won't gain much. Hand-written assembly may be worth it for some inner loops, say in a game where the loop runs once per pixel per frame at 60fps. But if you don't know the CPU very well, you could also make it worse than a compiler, because optimal assembly is not always intuitive. Modern CPUs and memory architectures are complex beasts.

For 99% of performance problems, just forget that. And for the remaining 1%, don't consider it, before you have done other optimizations (see below). Otherwise you most likely don't have the right inner loop to optimize. Hand-writing assembly is the last step, for example to squeeze out a few more FPS after having optimized everything else to the limit.

Instead, for performance, first thing to do is find the bottlenecks: Profiling and benchmarking. It's also needed to know if any optimizations you did actually improve anything, or if they make things worse (not uncommon when optimizing, when you forgot to take some detail into account).

Then primary way to improve performance is selecting the right (sub-)algorithms and data structures. Example: Like switching from insertion sort to simple quick sort may be a massive improvement. Unless data is sorted, in which case it will be massive penalty. Then you can further improve quick sort to also work on sorted data (via randomizing it), adapt your algorithm at runtime if you know it is sorted, switch to merge sort, and so on. This is leveraging decades of hard work by hundreds of very smart computer scientists who invented the commonly used algorithms.

Then there is the optimization of your own algorithms, bringing down their complexity, for example using dynamic programming techniques, by organizing your data right, using right data structures...

Ron · Answer 5 · 2014-08-12T04:36:45.070

If you are convinced you've optimized your C level code as much as possible you might want to look into exploiting the parallel processing abilities inherent in most modern microprocessors. You might want to look into OpenMP.

This is assuming that the particular algorithm that is consuming all your programs time is parallelizable... If your really hard core you might look into OpenCL or CUDA to exploit the massive parallel processing abilities of your GPU (assuming your platform has one...). That big for loop you where talking about... could the problem be split up so that several for loops could work on the problem at the same time?

You're much more likely to be successful with your goal by pursing the above (if its possible with your particular program), than by trying to beat the compiler with hand optimized assembly.

if you need a more fine tined solution than openMP, I suggest going pthread. Once you have learned pthread you will have automatically gained a far greater understanding of C than before. — , Aug 12 '14 at 20:02

score 0 · Answer 6 · answered Aug 12 '14 at 06:04

The short answer to each of the questions you posed is "Yes". However, this does not mean that writing parts of your code in assembly instead of C is worth it.

There are a number of questions you should ask and answer yourself first.

Is your program fast enough as it is currently written?
Have you profiled it? That is, do you know where the bottle necks are in your program? Focus on the areas that will give you the most bang for your buck.
Before writing parts of it in assembly, are there any algorithmic changes (still in C) that you can make that may speed things up? Again, focus on the areas that will give you the most bang for your buck.
Is it practical to speed things up with faster hardware?
Do you understand the various compiler optimization settings? Are you using the relevant ones?
Have you analyzed what the compiler is generating?
Is there room for optimization?
Are your optimization goals realistic?

If after all this you still think your program needs some assembly, try it. Remember that even though you may write something in assembly, it does not automatically mean that it will be faster than what the compiler generated. (After all, the compiler generates assembly too.)

score 0 · Answer 7 · answered Aug 12 '14 at 16:08

I would dissent from those who suggest that compiler optimizations cannot be improved upon by digging around in the assembly. All of the recommendations above about improving the algorithm are valid and I think that it is an important first step. However, once you have a piece of code that you've refined and optimized as much as you can in a higher level language there may yet be some utility in working through the code with a dis-assembler to find if there are any bottlenecks in your code.

Another point of observation would be that different languages and even compilers within the same language generate system code within the executable that is structured differently. You might be able to trim some of the fat if you are targeting a specific architecture, but you've really gotta know very specifically what routines you need to import, everything that the operating system expects you to do and further, what it forbids you from doing.

You have a valid point about reducing the size of the run-time. If you're only using a specific routine, like puts and you are including the whole of conio and stdio then you would be able to eliminate a sizeable chunk of code that you aren't using by introducing an assembly into the mix, rather than using a standard library. But breaking away from standards compliance can be problematic; the user will be able to tell immediately if your software is poorly implemented when expected behaviors start to fail. (an example being the ability to pipe the program output through less or more -- it may be faster to use BIOS routines, but this ability fails when you give the OS the cold shoulder.)

Can assembly make program written in C faster?

7 Answers7