Call scanf in X64 gcc inline asm

Question

I have a dynamic allocated 2d int array, called image, and a format string called format. I then use two nested for loops to get the inputs from standard input, and store them in the 2d array. So I can parse the integers from the inputs with different lengths DYNAMICALLY. For example, if I have a 3x3 2d array, I will need to use inline asm to push the element address in the array 9 times, and a push to the format string. I then call scanf, and balance the stack when finished.

BTW:The width and height of the array is assumed known already.

Here is my code on Windows (X64 system, compiled in x32 code). It works fine.

for (int i = 0; i < height; i++) {
    for (int j = width-1; j >=0; j--) {
        int tmp_addr = (int)&image[i][j];
        __asm push tmp_addr;
    }
    int pop_size = (width+1) * 4;
    __asm {
        push format;
        call func_scanf;
        mov read_size, eax;
        add esp, pop_size;
    }
}

The code did not work when I port it onto Linux (X64 system, compiled in X64 code).

for (int i = 0; i < height; i++) {
    for (int j = width-1; j >=0; j--) {
        long tmp_addr = (long)&image[i][j];
        //__asm push tmp_addr;
        __asm__ __volatile__(
            "push %0\n\t"
            ::"g"(tmp_addr)
        );
    }

    int pop_size = (width+1) * sizeof(long);
    /*__asm {
        push format;
        call func_scanf;
        mov read_size, eax;
        add esp, pop_size;
    }*/
    __asm__ __volatile__(
        "push %0\n\t"
        "call *%1\n\t"
        "mov %%rax,%2\n\t"
        "add %3,%%rsp"
        ::"g"(format),"g"(func_scanf),"g"(read_size),"g"(pop_size)
        :"%rax","%rsp"
    );
}

The error says Segmentation fault when this code is executed. What could go wrong? Thank you!

Maybe I'm missing something, but can't you just call `scanf` in a loop? — Matteo Italia, Apr 08 '18 at 23:17
Why on earth are you coding this in assembler? You're using `scanf()` so efficiency clearly isn't the reason why — or, if it is, you need to do a review and show the measurements that demonstrate that this coding can make the difference, because frankly it is extraordinarily improbable that there is a measurable performance benefit. So, do you just like making your life harder? Is anyone else ever going to need to look at the code? It is unmaintainable (it isn't clear that it can be developed easily and reliably, but it certainly isn't maintainable). — Jonathan Leffler, Apr 08 '18 at 23:21
There are various places for undefined and implementation defined behaviour in the C code already. Before stating with inline assembly (for whatever reason), learn writing correct and maintainable C code. And without a [mcve] its not really possible to tell where/why your code is broken. — too honest for this site, Apr 08 '18 at 23:33
Using inline asm to only call library functions is like asking Usain Bolt to walk your dog and expect that to make it do its things any faster. — Bo Persson, Apr 08 '18 at 23:49
This is just a guess, but I'm reasonably certain that you also clobber the stack pointer when you PUSH and POP things, so `%rsp` should be listed in the "clobbered registers" list. For example, the compiler may PUSH a register's value before your inline ASM to save its value and restore its value with POP. If your PUSH and POP instructions are not matched in the same inline ASM string, your modification of the stack pointer can result in a crash. — , Apr 08 '18 at 23:50
The major problem is that i have a dynamic 2d array and the dynamic inputs length from stdin. So i need to parse the integers from the inputs dynamically. — markable, Apr 08 '18 at 23:54
@markable: again, what's the problem with calling `scanf` in a loop? Calling `scanf("%d %d", &a, &b);` is the same as calling `scanf("%d", &a); scanf("%d", &b);`. — Matteo Italia, Apr 09 '18 at 00:11
What is the content of the `format` string? That might be causing some trouble too. — , Apr 09 '18 at 00:26
It is like "%d %d %d ....", and varies its length dynamically. — markable, Apr 09 '18 at 01:18

Matteo Italia · Accepted Answer · 2018-04-09T13:47:51.500

x86_64 code on Linux uses a completely different calling convention than x86 code on Windows. In particular, it tries to pass many parameters in registers before starting to use the stack. Also, variadics have some subtle extra rules (e.g. you have to specify in rax the number of XMM registers actually used, 0 if none is used).

scanf expects to find the first six pointer arguments in registers, but you put them on the stack, and the registers contain garbage values (whatever happens to be there at the time of the call); when dereferencing any of them to write the read value you get a segfault.

Also, modern compilers generally don't use rbp as frame pointer to access locals and parameters, the frame pointer is omitted and access to locals is done through rsp. With your pushes you moved the stack pointer without the compiler knowing, and now every stack access between your pushes and the return from the function call is going to be broken. You hand try to hand-hold the compiler around this, but it's dirty business and prone to break.

Even worse: if gcc thinks your function is a leaf function (and it may think so if the only function call is inside your assembly code, which is opaque to the compiler) it may be exploiting the red zone, putting stuff below the current value of rsp. Your pushes and function call may be overwriting this data. You can try to fight even this, but again, it's ugly stuff.

So: it's clear why your code doesn't work, and it's also clear that it's quite complicated to make it work correctly on the x86_64 calling convention - you'd have to put stuff in different registers or on the stack depending from the iteration, and find a way to tell gcc that you are messing with the stack pointer and to avoid to use the red zone.

What is not clear to me is: what's the point of this thing? If you need to read many values, if the number of values is fixed you can do a "normal" invocation of scanf in plain C. If instead the number of values to read is known only at runtime, as it seems from your comment,

It is like "%d %d %d ....", and varies its length dynamically.

just invoke many times scanf with a format string appropriate to read a single value:

for (int i = 0; i < height; i++) {
    for (int j = 0; j < width; j++) {
        scanf("%d", &image[i][j]);
    } 
}

This will have the exact same semantics as your code (ln platforms where it is working). By the way, add some error handling (= check the return value of scanf), your program as is will just stop reading when meeting invalid values, and go on with uninitialized values in image.

If performance is a problem, just ditch scanf - you can easily beat it anyway by writing the tokenizing code by hand and then calling strtol; you can even go faster than strtol (if you don't care about locale) by writing the conversion code by hand.

In any case, getting down to assembly level to construct a variadic call to scanf is a terrible, non portable solution in search of a problem.

Call scanf in X64 gcc inline asm

1 Answers1