1

Is there a macro for inline assembly to emit a C string as a sequence of bytes using __emit keyword?

For instance to emit "Hello", you have to write

__emit 'H'
__emit 'e'
__emit 'l'
__emit 'l'
__emit '0'
__emit 0x0

Would anybody happen to know of a macro that would instead allow one to just write EMIT_MACRO("Hello")?

TheRealChx101
  • 1,468
  • 21
  • 38
  • A data directive that emits an entire string literal is not available for inline assembly. You'd have to go with a pure assembly implementation instead. [Compiling Assembly from Visual Studio](http://stackoverflow.com/q/4548763/1889329) is fairly easy. As an added bonus, this also allows to target x64 or ARM architectures, which aren't supported through inline assembly. – IInspectable Mar 27 '15 at 12:07
  • @IInspectable Thanks but I'm award of that. I was just wondering if it's possible to have a macro that iterates over the string literal and expands to __emit BYTE for each byte during compilation stage. – TheRealChx101 Mar 27 '15 at 17:15
  • My goal was to be lazy and run away from having to manually __emit each byte. I know this keyword was meant for code but it would be nice to be able to do so. – TheRealChx101 Mar 27 '15 at 17:18
  • As I hinted to previously, you **can** emit multiple bytes in a single statement. The data directive [DB](https://msdn.microsoft.com/en-us/library/vstudio/8f6k0he8.aspx) does exactly that. It's just not available when using inline assembly. It's also quite unusual to emit a sequence of characters into the .text segment. It certainly isn't executable code, so what's behind this requirement. When using inline assembly, you can very easily define a (local) array initialized with a string literal, and access that from assembly code. – IInspectable Mar 27 '15 at 18:41
  • Well. I wanted an inline assembly approach. Also, the . text segment is if not all most of the times readable so I don't see why you can't read from it.Perhaps the question should be about macros and preprocessors. Like how to iterate through a string during compile time and like I said, I don't want to have to write emit for each byte of a big ass string embedded inside inline assembly code inside inline functions. – TheRealChx101 Mar 27 '15 at 19:41
  • An inline assembly approach would use `const char txt[] = "Hello";` and access that from the inline assembly. Using `__emit` to insert **data** is awkward. You'll spend/waste at least 2 bytes only to jump over code that isn't. What's the rationale behind that? What requirement are you trying to meet that you aren't telling us about? – IInspectable Mar 27 '15 at 22:15
  • Lol. I'm not worried about spending 2+ bytes for a jump/call. Infact that's the idea: http://pastebin.com/cm5Mu5UQ I hope that makes sense. – TheRealChx101 Mar 27 '15 at 23:25

1 Answers1

2

TL;DR: Not possible.

Hang on, not all is lost just yet. Up front, here's a teaser of what you can achieve: EMIT_STRING(H,e,l,l,o,!)

But let me expand on the blunt statement in the introduction first: Performing an operation for each element in a sequence requires some sort of iteration or recursion. Either way, you need a termination criteria to - well - terminate the expansion. While it is possible to have recursive macros through deferred expansion, there's no way to tell the preprocessor when to stop (citation needed; feel free to contribute).

So well then, that precludes the use of string literals. That puts the burden with splitting a string literal into a sequence of character literals on the developer. Unfortunate, but maybe not the end of the world. A naïve implementation would look like this:

#define EMIT1(c1) __asm _emit c1
#define EMIT2(c1, c2) EMIT1(c1) __asm _emit c2
#define EMIT3(c1, c2, c3) EMIT2(c1, c2) __asm _emit c3
...
#define EMIT63(c1, c2, ..., c63) EMIT62(c1, c2, ..., c62) __asm _emit c63

Sample: EMIT7('H','e','l','l','o','!','\0'). A step in the right direction, but not entirely convincing. For one, it's error prone that you have to pick the right macro depending on the argument count. Let's try to have the compiler pick the right one for us (based on Overloading Macro on Number of Arguments):

// get number of arguments with __NARG__
#define __ARG_N( \
      _1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \
     _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \
     _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \
     _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \
     _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \
     _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \
     _61,_62,_63,N,...) N
#define __RSEQ_N() \
     63,62,61,60,                   \
     59,58,57,56,55,54,53,52,51,50, \
     49,48,47,46,45,44,43,42,41,40, \
     39,38,37,36,35,34,33,32,31,30, \
     29,28,27,26,25,24,23,22,21,20, \
     19,18,17,16,15,14,13,12,11,10, \
     9,8,7,6,5,4,3,2,1,0
#define __NARG_I_(...) __ARG_N(__VA_ARGS__)
#define __NARG__(...)  __NARG_I_(__VA_ARGS__,__RSEQ_N())

// general definition for any function name
#define _VFUNC_(name, n) name##n
#define _VFUNC(name, n) _VFUNC_(name, n)
#define VFUNC(func, ...) _VFUNC(func, __NARG__(__VA_ARGS__)) (__VA_ARGS__)

// definition for EMIT_STRING
#define EMIT_STRING(...) VFUNC(EMIT, __VA_ARGS__)

The resulting invocation would then be EMIT_STRING('H','e','l','l','o','!','\0'). If MSC would compile this. As it turns out, this is not the case. MSC implements __VA_ARGS__ expansion - arguably correct - yet in a less useful way. Luckily, "all problems in computer science can be solved by another level of indirection", and this is no exception (see MSVC doesn't expand __VA_ARGS__ correctly):

#define EXPAND( x ) x

#define __NARG_I_(...) EXPAND(__ARG_N(__VA_ARGS__))

#define VFUNC(func, ...) EXPAND(_VFUNC(func, __NARG__(__VA_ARGS__)) (__VA_ARGS__))

#define EMIT_STRING(...) VFUNC(EMIT, __VA_ARGS__) __asm _emit '\0'

Note that I silently appended __asm _emit '\0' to the EMIT_STRING macro, so that the NUL-terminator doesn't have to be added explicitly. With that in place we can write EMIT_STRING('H','e','l','l','o','!') already.

This is not quite what I promised above. If you want to take it one step further, you can use the Charizing Operator (#@) (Microsoft Specfic):

#define EMIT1(c1) __asm _emit #@c1
#define EMIT2(c1, c2) EMIT1(c1) __asm _emit #@c2
...

One notable limitation of doing this is, that you can no longer use the , (comma) or (space) characters verbatim. They have to be escaped, e.g. using the \ooo escape sequence, as in EMIT_STRING(H,e,l,l,o,\054,\040,W,o,r,l,d,!).


Complete code for reference (EMIT4 through EMIT62 omitted for brevity):
#define EMIT1(c1) __asm _emit #@c1
#define EMIT2(c1, c2) EMIT1(c1) __asm _emit #@c2
#define EMIT3(c1, c2, c3) EMIT2(c1, c2) __asm _emit #@c3
...
#define EMIT63(c1, c2, ..., c63) EMIT62(c1, c2, ..., c62) __asm _emit c63

// Workaround for MSC - required since __VA_ARGS__ is interpreted as a single token.
#define EXPAND( x ) x
// get number of arguments with __NARG__
#define __ARG_N( \
      _1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \
     _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \
     _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \
     _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \
     _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \
     _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \
     _61,_62,_63,N,...) N
#define __RSEQ_N() \
     63,62,61,60,                   \
     59,58,57,56,55,54,53,52,51,50, \
     49,48,47,46,45,44,43,42,41,40, \
     39,38,37,36,35,34,33,32,31,30, \
     29,28,27,26,25,24,23,22,21,20, \
     19,18,17,16,15,14,13,12,11,10, \
     9,8,7,6,5,4,3,2,1,0
#define __NARG_I_(...) EXPAND(__ARG_N(__VA_ARGS__))
#define __NARG__(...)  __NARG_I_(__VA_ARGS__,__RSEQ_N())

// general definition for any function name
#define _VFUNC_(name, n) name##n
#define _VFUNC(name, n) _VFUNC_(name, n)
#define VFUNC(func, ...) EXPAND(_VFUNC(func, __NARG__(__VA_ARGS__)) (__VA_ARGS__))

// definition for EMIT_STRING
#define EMIT_STRING(...) VFUNC(EMIT, __VA_ARGS__) __asm _emit '\0'


Limitations:
  • , (comma) and (space) characters have to be escaped as \054 and \040 respectively.
  • Strings are limited to 63 characters in this implementation. While this can be expanded, there are Compiler Limits that impose restrictions.
  • Compiler warnings/errors are insanely limited during preprocessing. For example, if the invocation of the VFUNC macro results in a macro symbol that hasn't been previously defined (e.g. EMIT99), this will get silently ignored, and no code will be emitted.

Critique:

Except for making myself feel rather badass, the above implementation has little going for it when contrasted with inline assembly, the Way It's Meant To Be Played™:

const char txt[] = "Hello, World!";
__asm {
    push MB_OK
    push 0x0
    lea eax, [txt]
    push eax
    push 0x0
    call MessageBoxA
}
Community
  • 1
  • 1
IInspectable
  • 46,945
  • 8
  • 85
  • 181