7

I'm writing a performance-critical, number-crunching C++ project where 70% of the time is used by the 200 line core module.

I'd like to optimize the core using inline assembly, but I'm completely new to this. I do, however, know some x86 assembly languages including the one used by GCC and NASM.

All I know:

I have to put the assembler instructions in _asm{} where I want them to be.

Problem:

  • I have no clue where to start. What is in which register at the moment my inline assembly comes into play?
qz-
  • 674
  • 1
  • 4
  • 14
toxic shock
  • 87
  • 1
  • 1
  • 2
  • 17
    Do remember that "just writing it in ASM" isn't going to make it any faster. In order for it to do that, you need to be better at optimizing ASM than the compiler. – Matti Virkkunen May 15 '10 at 10:39
  • 1
    Related: [C++ code for testing the Collatz conjecture faster than hand-written assembly - why?](https://stackoverflow.com/q/40354978). Often you can tweak the C++ source to hand-hold the compiler into making better asm. Also, MSVC inline asm syntax is bad for tiny blocks, because the inputs have to be in memory not registers. [What is the difference between 'asm', '\_\_asm' and '\_\_asm\_\_'?](https://stackoverflow.com/q/3323445). – Peter Cordes Aug 31 '18 at 20:13

6 Answers6

13

You can access variables by their name and copy them to registers. Here's an example from MSDN:

int power2( int num, int power )
{
   __asm
   {
      mov eax, num    ; Get first argument
      mov ecx, power  ; Get second argument
      shl eax, cl     ; EAX = EAX * ( 2 to the power of CL )
   }
   // Return with result in EAX
}

Using C or C++ in ASM blocks might be also interesting for you.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • For the record, falling off the end of a non-`void` function after leaving something in EAX *is* apparently supported in MSVC, even if the function inlines. This is completely different from GCC with GNU-style `asm("template" : operands);`, and even Clang's `-fasm-blocks` syntax that looks like MSVC-style inline asm doesn't support that. In all other cases, falling off the end of a non-`void` function is undefined behaviour. – Peter Cordes Aug 31 '18 at 20:24
10

The microsoft compiler is very poor at optimisations when inline assembly gets involved. It has to back up registers because if you use eax then it won't move eax to another free register it will continue using eax. The GCC assembler is far more advanced on this front.

To get round this microsoft started offering intrinsics. These are a far better way to do your optimisation as it allows the compiler to work with you. As Chris mentioned inline assembly doesn't work under x64 with the MS compiler as well so on that platform you REALLY are better off just using the intrinsics.

They are easy to use and give good performance. I will admit I am often able to squeeze a few more cycles out of it by using an external assembler but they're bloody good for the productivity improvement they provide

evandrix
  • 6,041
  • 4
  • 27
  • 38
Goz
  • 61,365
  • 24
  • 124
  • 204
  • 7
    I wouldn't call the compiler poor, it really has no way to know whether you meant to change a register for the following C code or you want to write your own little bit of code without affecting the rest of the world. When people blame compilers, I generally find that they simply don't understand all that's involved. – Blindy May 16 '10 at 11:33
  • 1
    I CAN say its poor in those circumstances because it IS poor. I DO know whats going on there and I ALSO know that microsoft recommend against using inline assembly for EXACTLY that reason. – Goz May 17 '10 at 05:59
  • @Blindy "it really has no way to know whether..." - Well, GCC does know that. Because it provides means to declare the 'interface' of the inline assembly to the rest of the program. – JimmyB Apr 26 '16 at 15:30
  • Sure, but GCC has a different looking `asm` keyword. I'm talking about VC itself, which, for backwards compatibility reasons, will most likely keep the simplistic `asm` they have where you can't define the registers you're going to use. – Blindy Apr 27 '16 at 04:47
  • @Blindy: Not only is MSVC-style inline asm syntax not good for performance unless you write whole loops in it (input operands have to be in memory), the actual implementation of [it is / was poor and brittle](https://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-asm-and-asm#comment59576185_35959859). e.g. apparently it wasn't even safe to use in a function with a register-args calling convention. There's no reason why that *should* be a problem; the compiler can just spill if it can't prove that your doesn't clobber registers. – Peter Cordes Aug 31 '18 at 20:18
6

Nothing is in the registers. as the _asm block is executed. You need to move stuff into the registers. If there is a variable: 'a', then you would need to

__asm {
  mov eax, [a]
}

It is worth pointing out that VS2010 comes with Microsofts assembler. Right click on a project, go to build rules and turn on the assembler build rules and the IDE will then process .asm files.

this is a somewhat better solution as VS2010 supports 32bit AND 64bit projects and the __asm keyword does NOT work in 64bit builds. You MUST use external assembler for 64bit code :/

Chris Becke
  • 34,244
  • 12
  • 79
  • 148
3

I prefer writing entire functions in assembly rather than using inline assembly. This allows you to swap out the high level language function with the assembly one during the build process. Also, you don't have to worry about compiler optimizations getting in the way.

Before you write a single line of assembly, print out the assembly language listing for your function. This gives you a foundation to build upon or modify. Another helpful tool is the interweaving of assembly with source code. This will tell you how the compiler is coding specific statements.

If you need to insert inline assembly for a large function, make a new function for the code that you need to inline. Again replace with C++ or assembly during build time.

These are my suggestions, Your Mileage May Vary (YMMV).

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
1

Go for the low hanging fruit first...

As other have said, the Microsoft compiler is pretty poor at optimisation. You may be able to save yourself a lot of effort just by investing in a decent compiler, such as Intel's ICC, and re-compiling the code "as is". You can get a 30 day free evaluation license from Intel and try it out.

Also, if you have the option to build a 64-bit executable, then running in 64-bit mode can yield a 30% performance improvement, due to the x2 increase in number of available registers.

Paul R
  • 208,748
  • 37
  • 389
  • 560
1

I really like assembly, so I'm not going to be a nay-sayer here. It appears that you've profiled your code and found the 'hotspot', which is the correct way to start. I also assume that the 200 lines in question don't use a lot of high-level constructs like vector.

I do have to give one bit of warning: if the number-crunching involves floating-point math, you are in for a world of pain, specifically a whole set of specialized instructions, and a college term's worth of algorithmic study.

All that said: if I were you, I'd step through the code in question in the VS debugger, using the Disassembly view. If you feel comfortable reading the code as you go along, that's a good sign. After that, do a Release compile (Debug turns off optimization) and generate an ASM listing for that module. Then if you think you see room for improvement...you have a place to start. Other people's answers have linked to the MSDN documentation, which is really pretty skimpy but still a reasonable start.

egrunin
  • 24,650
  • 8
  • 50
  • 93