1

I'm currently running external C++ functions to do simple input of strings. My overall goal is to concatenate the two user-input strings, but for some reason I am getting read access violations while returning the inputString function the second time.

I suspect after looking into it that my function is returning both of these return strings at the same address. I also think I could be managing the stack incorrectly.

extern inputString : proto C
extern outputStringLen : proto C

.data 

.CODE
asm_main PROC
    sub rsp, 20h                ; creating shadow space
    call inputString            
    mov rcx, rax                
    call outputStringLen
    mov r12, rax

    call inputString            
    mov rcx, rax                
    call outputStringLen
    mov r13, rax

    add rsp, 20h                ; deleting created space
    ret                         ; ret to stack
asm_main ENDP
END

C++ code:

extern "C" string inputString() {
    string strInput;
    cout << "Enter string input: ";
    cin >> strInput;
    return strInput;
}

extern "C" int outputStringLen(string strInput) {
    int strLength = 0;

    for (int i = 0; i < strInput.length(); i++) {
        strLength++;
    }

    return strLength;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Suezzeus
  • 31
  • 4
  • what's `outputStringLen`? – Alan Birtles May 11 '21 at 17:20
  • 1
    You're not passing a pointer to the `std::string` return-value object for `inputString` to store its result in. It's typically a 32-byte object and doesn't fit in RAX. Use a debugger to see which instruction faults. (Probably one in `inputString`) – Peter Cordes May 11 '21 at 17:21
  • Edited the code I posted, it should be there @AlanBirtles. – Suezzeus May 11 '21 at 17:22
  • 1
    Note that your overall goal is fairly complicated, especially given that C++ implementations use short-string optimizations (keeping the data inside the std::string object itself), but for longer strings use the same space to store 3 pointers like a std::vector. Or did you want to call a C++ function that uses `std::string::operator+`? – Peter Cordes May 11 '21 at 17:26
  • @PeterCordes Ah thank you, I was wondering how that worked honestly. I have the two functions working properly though. It will accurately find the length of the string. However, it will fail when I try to run the functions again. – Suezzeus May 11 '21 at 17:27
  • you can use the compiler output as a reference: https://godbolt.org/z/a4shMMqsP – Alan Birtles May 11 '21 at 17:30
  • if function return object, which not fit to al, ax, eax, rax, rax:rdx - this function take hidden first argument - pointer to this object. caller must allocate space for object by self and pass pointer of this object to function. so you need allocate space for *string* in asm code and pass it via *rcx* to *inputString*. really return object always bad idea. much better if you change signature of *inputString* to `void inputString(string* )` – RbMm May 11 '21 at 17:42
  • also `sub rsp, 20h` is wrong. must `sub rsp,10h*n + 8` - at the entry point of function must be `rsp==10h*n+8` - so you and must sub `10h*m + 8` for have `*10h` stack align before other function call. minimum `28h` – RbMm May 11 '21 at 17:51
  • also because you must allocate *string* by self, you also need call constructor/destructor for this *string*.. really not good idea work with such objects and functions from asm – RbMm May 11 '21 at 17:56
  • minimal masm code - https://godbolt.org/z/KT3Keaq3E - you really want all this ? – RbMm May 11 '21 at 18:03

1 Answers1

3

You're not passing a pointer to the std::string return-value object for inputString to store its result in. It's typically a 32-byte object and doesn't fit in RAX. Like most calling conventions Windows x64 handles large struct/class returns (and non-trivially-copyable objects) by having the caller pass a pointer as a first arg. https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160

Use a debugger to see which instruction faults. (Probably one in inputString, using an RCX that got stepped on earlier.)

Probably on the first call to your asm_main, RCX still happens to hold char **argv or some other valid pointer to writable memory. When you call inputString the first time, you're passing this as the pointer to the return-value object. But outputStringLen has probably stepped on RCX itself, so the 2nd call passes an invalid pointer.

i.e. The first call only happens to work, and would fail with a different caller for asm_main.


This seems like a very complicated way to get your feet wet with assembly language!

std::string is not a trivially-copyable type; it has copy-constructors and a destructor, and is actually a container that can either hold data directly or point to dynamically-allocated storage.

MSVC even warns about using extern "C" on a function returning a std::string:

<source>(4): warning C4190: 'inputString' has C-linkage specified, but returns UDT 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>' which is incompatible with C
C:/data/msvc/14.28.29914/include\xstring(4648): note: see declaration of 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>'

Working with buffers like sub rsp, 88 to reserve 88 bytes on the stack, and have C++ functions that take a char * arg, would be simpler in asm.

Speaking of which, to re-align RSP by 16 on entry to asm_main, you should adjust RSP by 16*n + 8. So at least sub rsp, 28h, since you aren't pushing anything.


C++ containers like std::string are hard to work with in asm

Your overall goal is fairly complicated, especially given that C++ implementations use short-string optimizations (keeping the data inside the std::string object itself), but for longer strings use the same space to store 3 pointers like a std::vector.

Or did you want to call a C++ function that uses std::string::operator+? That would make it easier, but you'd still leak memory for the two std::string return-value objects if you only return the concatenated string object. (If you'd written the caller in C++, it would have two std::string local vars, and would run their destructors on exit.) I guess operator+= would mean you'd only need to dispose of one of them, since it would append to an existing std::string object if you pass it by reference.

Note that in asm, int outputStringLen(string strInput) looks basically the same as int outputStringLen(const string &strInput). Both take a pointer (because std::string is too large to pass in one register, so the Windows x64 calling convention requires the caller to create a temporary object and pass a pointer to it, to implement call by value). So it's just a matter of whether the caller creates a tmp object, or whether you pass a pointer to an existing object.

You should look at compiler output from a C++ function that calls your other C++ functions*, to see what a compiler would do. Much of How to remove "noise" from GCC/clang assembly output? applies - including the recommendation to put code on the Godbolt Compiler Explorer -

#include <string>
#include <cstdlib>

extern "C" std::string inputString();
extern "C" size_t outputStringLen(const std::string &strInput);
//extern "C" size_t outputStringLen(std::string strInput);  // *much* more code to pass a copy by value

int sink;  // to show the output definitely going somewhere, not just staying in RAX
void asm_main(void) {
    std::string a = inputString();
    size_t len = outputStringLen(a);
    sink = len;
}

compiles with MSVC -O2 -GS-: https://godbolt.org/z/4YdG1bf4o. (Optimization removes a ton of store/reload and boils it down to the work that has to happen. -GS- removes a buffer-overflow check.)

a$ = 32
void asm_main(void) PROC                             ; asm_main, COMDAT
$LN36:
        sub     rsp, 72                             ; 00000048H
        lea     rcx, QWORD PTR a$[rsp]         ;;; output pointer
        call    inputString
        lea     rcx, QWORD PTR a$[rsp]         ;;; same pointer arg
        call    outputStringLen
        mov     rdx, QWORD PTR a$[rsp+24]
        mov     DWORD PTR int sink, eax   ; sink
        cmp     rdx, 16                       ;;; check for short-string => no delete
        jb      SHORT $LN16@asm_main
        mov     rcx, QWORD PTR a$[rsp]
        inc     rdx
        mov     rax, rcx
        cmp     rdx, 4096               ; 00001000H
        jb      SHORT $LN26@asm_main
        mov     rcx, QWORD PTR [rcx-8]
        add     rdx, 39                             ; 00000027H
        sub     rax, rcx
        add     rax, -8
        cmp     rax, 31              ;; some kind of invalid / corrupt std::string check?
        ja      SHORT $LN34@asm_main
$LN26@asm_main:
        call    void operator delete(void *,unsigned __int64)               ; operator delete
$LN16@asm_main:
        add     rsp, 72                             ; 00000048H
        ret     0
$LN34@asm_main:
        call    _invalid_parameter_noinfo_noreturn
        int     3
$LN32@asm_main:
void asm_main(void) ENDP                             ; asm_main

I don't know why it needs to check anything and conditionally call _invalid_parameter_noinfo_noreturn; that part is presumably never reached during normal execution so can basically be considered noise.

The pointer passing to inputString shows what you should have been doing:

a$ = 32

...
        sub     rsp, 72      ; shadow space + sizeof(std::string) + alignment padding

        lea     rcx, QWORD PTR a$[rsp]    ;;; Pointer to return-value object
        call    inputString
        lea     rcx, QWORD PTR a$[rsp]
        call    outputStringLen
        ...
        mov     DWORD PTR int sink, eax   ; sink

(I think in Windows x64, functions that return a large object via a hidden output pointer also have to return that pointer in RAX, so your mov rcx, rax is also safe.)

Also note the conditional call to operator delete, depending on the size of the std::string (detects short-string optimization to see if there was any dynamically-allocated storage to free).

And remember, this is the simple version; passing by const reference, not by value which would have to copy-construct another std::string object.


The ABI for std::string is determined by the implementation details in the C++ headers. It's not something that's particularly easy to interoperate with from asm. I'm partly showing you the details to warn you away from trying to do this, as much as to give you pointers to find the info you'd need to hand-write correct asm that interacts with C++ std::string. Normally you want to leave that to compilers.

A good rule of thumb is that functions that you want to call from asm should be actually callable from C, unless you want to write asm that knows about the C++ compiler's C++ ABI (e.g. layouts and other internal details of std::string). Taking or returning a std::string does not qualify: you can't teach a C compiler to properly handle a std::string because it has constructors and destructors, and overloaded operators. That's why MSVC complains about returning one by value in an extern "C" function.

Try writing asm_main in actual C, not C++, and see what problems you run into.


Your outputStringLen is massively over-complicated. std::string is an explicit-length string, i.e. it knows its own length so you can just ask for it. return str.length(). Looping for (i=0, j=0 ; i<n ; i++){ j++; } is a really inefficient way to write i = j = n;.

Perhaps you were thinking of char* C strings, with a 0 terminator, where you do have to loop (or call strlen) to find the length.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    You can get rid of the security stuff by passing ` /GS-` to the compiler – Alan Birtles May 11 '21 at 18:10
  • @AlanBirtles: Thanks, that's a cleaner example. Is there a way to omit the check that leads to `call _invalid_parameter_noinfo_noreturn`? I assume that's for some kind of corrupted `std::string`? – Peter Cordes May 11 '21 at 18:14
  • @petercordes Thanks for this detailed response. I will try what you recommended. This is a problem from my assignment and they want me to write it so that my c++ code is only really utilizing cin and cout to concatenate 2 user-input strings. I am pretty unfamiliar with a lot of assembly still so it's going to take some time. – Suezzeus May 11 '21 at 19:00
  • @Suezzeus: Are you sure they require you to use `std::string`, though, and not `char *` / `char buf[100]` arrays, aka C strings? – Peter Cordes May 11 '21 at 19:02
  • @petercordes No they don't, I honestly was just thinking it was easier to manage. Since I didn't know how it returned back to asm. – Suezzeus May 11 '21 at 19:19
  • @Suezzeus: You should definitely use `char` arrays / pointers, then; there's much less overall complexity with that, and what you see is what you get (i.e. it's easy to see how a C function taking a `char *p` arg will compile, and to think through the necessary steps for concatenating two strings. e.g. simply reading the second into a position in a larger buffer that starts where the first one ends.) – Peter Cordes May 11 '21 at 19:50
  • 1
    @PeterCordes That makes sense. I'm hoping it will all fall into place in my head once I start working on it, thanks again. – Suezzeus May 12 '21 at 06:19