How can I insert repeated NOP statements using Visual C++'s inline assembler?

Question

Visual C++, using Microsoft's compiler, allows us to define inline assembly code using:

__asm {
    nop
}

What I need is a macro that makes possible to multiply such instruction n times like:

ASM_EMIT_MULT(op, times)

for example:

ASM_EMIT_MULT(0x90, 160)

Is that possible? How could I do this?

https://msdn.microsoft.com/en-us/library/kyzds0ks.aspx , https://msdn.microsoft.com/en-us/library/352sth8z.aspx — Jose Manuel Abarca Rodríguez, Jul 20 '16 at 15:24
@JoseManuelAbarcaRodríguez thank you for links, but these references make not clear how can I solve the problem — Francisco Gomes, Jul 20 '16 at 16:03

score 6 · Accepted Answer · edited May 23 '17 at 11:44

With MASM, this is very simple to do. Part of the installation is a file named listing.inc (since everyone gets MASM as part of Visual Studio now, this will be located in your Visual Studio root directory/VC/include). This file defines a series of npad macros that take a single size argument and expand to an appropriate sequence of non-destructive "padding" opcodes. If you only need one byte of padding, you use the obvious nop instruction. But rather than using a long series of nops until you reach the desired length, Intel actually recommends other non-destructive opcodes of the appropriate length, as do other vendors. These pre-defined npad macros free you from having to memorize that table, not to mention making the code much more readable.

Unfortunately, inline assembly is not a full-featured assembler. There are a lot of things missing that you would expect to find in real assemblers like MASM. Macros (MACRO) and repeats (REPEAT/REPT) are among the things that are missing.

However, ALIGN directives are available in inline assembly. These will generate the required number of nops or other non-destructive opcodes to enforce alignment of the next instruction. Using this is drop-dead simple. Here is a very stupid example, where I've taken working code and peppered it with random aligns:

unsigned long CountDigits(unsigned long value)
{
   __asm
   {
      mov    edx, DWORD PTR [value]
      bsr    eax, edx
      align  4
      xor    eax, 1073741792
      mov    eax, DWORD PTR [4 * eax + kMaxDigits+132]
      align  16
      cmp    edx, DWORD PTR [4 * eax + kPowers-4]
      sbb    eax, 0
      align  8
   }
}

This generates the following output (MSVC's assembly listings use npad x, where x is the number of bytes, just as you'd write it in MASM):

PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
    00000 8b 54 24 04        mov   edx, DWORD PTR _value$[esp-4]
    00004 0f bd c2           bsr   eax, edx
    00007 90                 npad  1       ;// enforcing the "align 4"
    00008 35 e0 ff ff 3f     xor   eax, 1073741792
    0000d 8b 04 85 84 00     
          00 00              mov   eax, DWORD PTR _kMaxDigits[eax*4+132]
    00014 eb 0a 8d a4 24     
          00 00 00 00 8d     
          49 00              npad  12      ;// enforcing the "align 16"
    00020 3b 14 85 fc ff     
          ff ff              cmp   edx, DWORD PTR _kPowers[eax*4-4]
    00027 83 d8 00           sbb   eax, 0
    0002a 8d 9b 00 00 00     
          00                 npad  6       ;// enforcing the "align 8"
    00030 c2 04 00           ret   4
CountDigits ENDP
_TEXT   ENDS

If you aren't actually wanting to enforce alignment, but just want to insert an arbitrary number of nops (perhaps as filler for later hot-patching?), then you can use C macros to simulate the effect:

#define NOP1   __asm { nop }
#define NOP2   NOP1  NOP1
#define NOP4   NOP2  NOP2
#define NOP8   NOP4  NOP4
#define NOP16  NOP8  NOP8
// ...
#define NOP64  NOP16 NOP16 NOP16 NOP16
// ...etc.

And then pepper your code as desired:

unsigned long CountDigits(unsigned long value)
{
   __asm
   {
      mov   edx, DWORD PTR [value]
      bsr   eax, edx
      NOP8
      xor   eax, 1073741792
      mov   eax, DWORD PTR [4 * eax + kMaxDigits+132]
      NOP4
      cmp   edx, DWORD PTR [4 * eax + kPowers-4]
      sbb   eax, 0
   }
}

to produce the following output:

PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
  00000 8b 54 24 04      mov   edx, DWORD PTR _value$[esp-4]
  00004 0f bd c2         bsr   eax, edx
  00007 90               npad  1     ;// these are, of course, just good old NOPs
  00008 90               npad  1
  00009 90               npad  1
  0000a 90               npad  1
  0000b 90               npad  1
  0000c 90               npad  1
  0000d 90               npad  1
  0000e 90               npad  1
  0000f 35 e0 ff ff 3f   xor   eax, 1073741792
  00014 8b 04 85 84 00
        00 00            mov   eax, DWORD PTR _kMaxDigits[eax*4+132]
  0001b 90               npad  1
  0001c 90               npad  1
  0001d 90               npad  1
  0001e 90               npad  1
  0001f 3b 14 85 fc ff
        ff ff            cmp   edx, DWORD PTR _kPowers[eax*4-4]
  00026 83 d8 00         sbb   eax, 0
  00029 c2 04 00         ret   4
CountDigits ENDP
_TEXT ENDS

Or, even cooler, we can use a bit of template meta-programming magic to get the same effect in style. Just define the following template function and its specialization (important to prevent infinite recursion):

template <size_t N> __forceinline void npad()
{
    npad<N-1>();
    __asm  { nop }
}
template <> __forceinline void npad<0>()  { }

And use it like this:

unsigned long CountDigits(unsigned long value)
{
   __asm
   {
      mov   edx, DWORD PTR [value]
      bsr   eax, edx
   }
   npad<8>();
   __asm
   {
      xor   eax, 1073741792
      mov   eax, DWORD PTR [4 * eax + kMaxDigits+132]
   }
   npad<4>();
   __asm
   {
      cmp   edx, DWORD PTR [4 * eax + kPowers-4]
      sbb   eax, 0
   }
}

That'll produce the desired output (exactly the same as the one just above) in all optimized builds—whether you optimize for size (/O1) or speed (/O2)—…but not in debugging builds. If you need it in debug builds, you'll have to resort to the C macros. :-(

Amazing! C++ template metaprogramming is awesome! It is a really bad thing it does not work for /Od as well :( — Francisco Gomes, Jul 21 '16 at 10:04
`npad()` is really nice, I wish we could also provide a byte to be inlined with `_emit`, like: `... void npad(BYTE byte) { ...; _asm { _emit byte } }` so we could emit a certain byte N times. — karliwson, Jul 23 '17 at 18:33
You should be able to do that, @karliwson. Just pass `byte` as a *template* parameter, rather than a function parameter. — Cody Gray - on strike, Jul 24 '17 at 10:54
Amazing!!!! Metaprogramming is the way to go. Sadly it doesn't work for C — Peter, Mar 07 '20 at 19:08
If you're having trouble using the metaprogramming example, you may want to start from here https://stackoverflow.com/questions/41102421/visual-c-forceinline-strange-behavior — Peter, Mar 07 '20 at 19:30
Sorry for the earlier pings. The question was migrated back here. Now just have to reopen it lol. If you were somehow involved in its migration back here - Thanks. — Michael Petch, Oct 24 '20 at 16:26

Peter · Answer 2 · 2020-03-07T20:02:19.113

Base on Cody Gray Answer and code example for metaprogramming using template recursion and inline or forceinline as stated on the code before

template <size_t N> __forceinline void npad()
{
    npad<N-1>();
    __asm  { nop }
}
template <> __forceinline void npad<0>()  { }

It won't work on visual studio, without setting some options and is not a guarantee it will work

Although __forceinline is a stronger indication to the compiler than __inline, inlining is still performed at the compiler's discretion, but no heuristics are used to determine the benefits from inlining this function.

You can read more about this here https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4714?view=vs-2019

How can I insert repeated NOP statements using Visual C++'s inline assembler?

2 Answers2