how to compare 32 bit char against 32 bit char in, inline assembely c++

Question

I want to compare two 4-char strings . for example "A","T","T","C" against "A","T","T","c". I have stored these chars in an array in c++ and I want to compare this two words in an instruction. moreover, I don't want to use loops for comparison. how can I store this words in "eax" and "ebx" register and compare to each other?

int _tmain()
{
char b[3],a[3];
b[0]='A',b[1]='T',b[2]='C',b[3]='G';
a[0]='A',a[1]='T',a[2]='C',a[3]='G';
__asm
{
    movzx eax,b[1]  //here i want to load b to eax
}
getchar();
return 0;
}

if there is another idea for comparing two words in a single instruction please share thank you.

You have arrays of size 3, but are assigning 4 things to them. — , Jul 31 '18 at 20:19
why do you want assembly? The standard lib functions are more efficient than any assembly you can write. I'll bet that `memcmp` will in general do better than any assembly you write. — Support Ukraine, Jul 31 '18 at 20:24
To expand on @4386427's comment this has a very strong odor of [Premature Optimization](http://wiki.c2.com/?PrematureOptimization) — dgnuff, Jul 31 '18 at 20:27
Please specify your programme behaviour further. In the example you are using, what result do you expect? — jan.sende, Jul 31 '18 at 20:29
`movzx eax,b[1]` is wrong, perhaps `movzx eax,b[0]` (assuming `b` is corrected to 4 elements). — Weather Vane, Jul 31 '18 at 20:33
C and C++ are different languages, chose one. From the look of it, I think you are programming in C, but who knows. — Jens Gustedt, Jul 31 '18 at 20:36
thank u for all comments. my program want to compare in high speed so i want to use assembly — Reza Behboodi, Jul 31 '18 at 20:43
if i can mov whole b array to eax or ebx my problem will fix — Reza Behboodi, Jul 31 '18 at 20:44
@RezaBehboodi Writting in assembley does not gurantee high speed. In fact it means you are going as fast you "You" know how to go. The compiler is probably much better at optimizing for speed. Thus if you use high level constructs correctly you will be going as fast as the "Compiler" knows how to go (which will probably beat "You" in most situations (and when it does not beat you will equal you). This is what dgbuff is talking about when he mentions premature optimization. — Martin York, Jul 31 '18 at 20:54

Peter Cordes · Answer 1 · 2018-07-31T22:38:12.323

The rest of this answer is assuming you need to use inline asm for some homework assignment (because it will not be more efficient than what a smart compiler will inline for a 4-byte memcmp). See @MartinYork's answer for what gcc/clang do for a 4-byte memcmp. But surprisingly, only gcc7 and later inlines the constant-size memcmp. Clang at least back to 3.5 manages that.

MSVC 2017 also inlines memcmp for a constant 4-byte size, and std::array operator ==, producing the same asm as gcc/clang. (I didn't test earlier versions). See the pure C++ version on the Godbolt compiler explorer.

The necessary syntax to load a dword from a char array is a dword ptr size override.

// true for equal, false for not-equal
bool foo()
{
    //char a[] = "ACTG";
    char a[] = {'A', 'C', 'T', 'G'};
    char b[] = {'A', 'T', 'T', 'G'};
    _asm {
        mov eax, dword ptr a       // mov eax, a   would complain 
        cmp eax, dword ptr b
        sete al                    // al= 0 or 1 depending on ZF, the "e" condition like je
    }
    // falling off the end of a non-void function implicitly returns EAX
    // apparently this is supported in MSVC even when inlining
}

As a complete function, this compile as follows, with MSVC 19, 2017, with -Ox on the Godbolt compiler explorer:

 ;; define a couple assembler constants for use
_a$ = -8                                                ; size = 4
_b$ = -4                                                ; size = 4
foo PROC
        sub      esp, 8
        mov      DWORD PTR _a$[esp+8], 1196704577 ; 47544341H
        mov      DWORD PTR _b$[esp+8], 1196708929 ; 47545441H
  ;; inline asm block starts here
        mov      eax, DWORD PTR _a$[esp+8]
        cmp      eax, DWORD PTR _b$[esp+8]
        sete     al
  ;; and ends here
        add      esp, 8
        ret      0
foo ENDP

The first 2 mov instruction are generated by the compiler, storing the 4-byte arrays to the stack with dword MOV-immediate.

If you want to return a 0 / non-0 int instead of a 0 / 1 bool, you can use @P__J__'s suggestion of mov / sub instead of checking flags after a cmp. Two equal dwords will leave the register 0, anything else won't. (xor has the same property.)

If you wanted to compare 4 bytes of a char* that you got as a function arg, it would be a pointer, not a C array, so you have to load the pointer into a register yourself in inline asm. (Even if the compiler already has the pointers in registers; MSVC inline asm syntax basically sucks for small blocks because it forces a store/reload round-trip (~5 cycles of latency) for inputs, and for output unless you can use the apparently-supported hack of leaving something in EAX and falling off the end of a non-void function. See also What is the difference between 'asm', '__asm' and '__asm__'? for a comparison with GNU C inline asm, which makes it easy to ask for inputs in registers and produce multiple outputs in registers, allowing the compiler to optimize as much as possible. Of course it still defeats constant-propagation; if you used memcmp the compiler could just return 0 because the arrays have compile-time constant contents. https://gcc.gnu.org/wiki/DontUseInlineAsm)

Anyway, this is what you get for comparing the first 4 bytes of function args:

char bar(char *a, char *b)
{
    // a and b are pointers, not arrays
    _asm {
        mov eax, a              // loads the address
        mov eax, [eax]          // loads 4 bytes of data
        mov ecx, b
        cmp eax, [ecx]
        sete al
    }
}

bar PROC
        mov      eax, DWORD PTR _a$[esp-4]
        mov      eax, DWORD PTR [eax]
        mov      ecx, DWORD PTR _b$[esp-4]
        cmp      eax, DWORD PTR [ecx]
        sete     al
        ret      0

And it's actually worse if you compile with -Gv or whatever to enable a better calling convention that passes args in registers: the compiler has to spill the pointer args to the stack for asm to reload them, instead of it turning into a reg-reg move. AFAIK, there's no way via casting or whatever to get the compiler to load pointers into registers for you so you can reference the array contents directly in inline asm.

score 2 · Accepted Answer · edited Jul 31 '18 at 20:44

2

To start with you have a serious problems with your arrays. You define the arrays to hold 3 elements but you try to fill 4 elements into the arrays. That's real bad and cause undefined behavior.

Besides that... drop the assembly! The lib functions will (in nearly all cases) out perform what you can do in assembly. In other words - just use memcmp

Like:

int main()
{
    char b[4],a[4];
    b[0]='A',b[1]='T',b[2]='C',b[3]='G';
    a[0]='A',a[1]='T',a[2]='C',a[3]='G';

    if (memcmp(a, b, sizeof(a)) == 0)
         printf("Equal\n");
    else
         printf("Different");

    return 0;
}

edited Jul 31 '18 at 20:44

Cheers and hth. - Alf

142,714
15
209
331

answered Jul 31 '18 at 20:32

Support Ukraine

42,271
4
38
63

thank u for your answer. I want exactly this solution in assembly – Reza Behboodi Jul 31 '18 at 20:46
1

@RezaBehboodi once again why assembly? An assignment at school? If so just go and download one of the many public available versions of `memcmp` - it's available out their... or just compile a program with `memcmp` and look at the disassembly... but it won't make you a better C programmer - if you want to learn C then stick to C and leave the details to platform experts. – Support Ukraine Jul 31 '18 at 20:51
@4386427: you're totally missing the point by suggesting finding a full implementation of memcmp. The OP wants to learn how it can be done for the special case of size = 4, where it can be done with one `dword` compare. (You'll get even better asm from a smart compiler that inlines it this way for you, though; see Martin's answer, and links in mine for why MSVC inline asm is crap for wrapping 1 or a couple instructions: store/reload delay is unavoidable, and it can't optimize for constants.) – Peter Cordes Jul 31 '18 at 22:30
what is the structure of memcp function? dose it use loop? – Reza Behboodi Aug 02 '18 at 08:47
@RezaBehboodi Maybe. If you use `memcmp` in production code with full optimization turned on, it definitely won't compare a byte at a time. For your case, the compiler will almost certainly recognize that the size to be compared is four, and optimize down to a simple register to register comparison. For larger buffers, it will use a loop, but if memory serves, at least for MSVC, it compares four bytes at a time in x86 mode, and might compare eight at a time in x64 mode. – dgnuff Aug 02 '18 at 22:48

score 2 · Answer 3 · edited Jul 31 '18 at 22:34

2

I am going to say that doing it in assembly is a bad idea.

You should be using high level language constructs. This will allow the code to be portable and when push comes to shove the compiler will beat "most" humans at any peephole optimization like this.

So I checked the output of g++ to see what assembly it generated.

main.cpp

#include <array>
#include <iostream>

bool testX(int a, int b);
bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b);
bool testZ(char const(&a)[4], char const(&b)[4]);

int main()
{
    {
        int a = 'ATCG';
        int b = 'ATCG';
        if (testX(a, b)) {
            std::cout << "Equal\n";
        }
    }
    {
        std::array<char, 4> a {'A', 'T', 'C', 'G'};
        std::array<char, 4> b {'A', 'T', 'C', 'G'};
        if (testY(a, b)) {
            std::cout << "Equal\n";
        }
    }
    {
        char    a[] = {'A', 'T', 'C', 'G'};
        char    b[] = {'A', 'T', 'C', 'G'};

        if (testZ(a, b)) {
            std::cout << "Equal\n";
        }
    }
}

With optimization enabled, we get nice asm from clang, and usually from recent gcc on the Godbolt compiler explorer. (The main above would optimize away the compares if the functions can inline, because the inputs are compile-time constants.)

X.cpp

bool testX(int a, int b)
{
    return a == b;
}

# gcc and clang -O3 asm output
testX(int, int):
    cmpl    %esi, %edi
    sete    %al
    ret

Z.cpp

#include <cstring>

bool testZ(char const(&a)[4], char const(&b)[4])
{
    return std::memcmp(a, b, sizeof(a)) == 0;
}

Z.s

# clang, and gcc7 and newer, -O3
testZ(char const (&) [4], char const (&) [4]):
    movl    (%rdi), %eax
    cmpl    (%rsi), %eax
    sete    %al
    retq

Y.cpp

#include <array>

bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b)
{
    return a == b;
}

Y.s

# only clang does this.  gcc8.2 actually calls memcmp with a constant 4-byte size
testY(std::array<char, 4ul> const&, std::array<char, 4ul> const&):           
    movl    (%rdi), %eax
    cmpl    (%rsi), %eax
    sete    %al
    retq

So std::array and memcmp for comparing 4-byte objects both produce identical code with clang, but with gcc only memcmp optimizes well.

Of course, the stand-alone version of the function has to actually produce a 0 / 1 integer, instead of just setting flags for a jcc to branch on directly. The caller of these functions will have to test %eax,%eax before branching. But if the compiler can inline these functions, that overhead goes away.

edited Jul 31 '18 at 22:34

Peter Cordes

328,167
45
605
847

answered Jul 31 '18 at 21:13

Martin York

257,169
86
333
562

Looks like you compiled with optimization disabled (`-fomit-frame-pointer` is enabled at -O1 or -O2 with clang, but your code uses RBP as a frame pointer.) `jmp` is an unconditional jump and `mov` isn't a compare. So this is basically nonsense. Write a function that takes two inputs that aren't compile-time constants, and compile with optimization. BTW, C doesn't support multi-byte character constants the way assemblers like NASM do, but apparently it's a GNU C extension because it does work. https://godbolt.org/g/dp3NJM. Also, did you manually remove the leading `.` from your labels? – Peter Cordes Jul 31 '18 at 21:23
@PeterCordes: "C doesn't support multi-byte character constants" - where did you get that idea? – Cheers and hth. - Alf Jul 31 '18 at 21:35
@Martin: Please show the assembly of a separately compiled function that takes two arguments and compares them. With optimization on. The code presented *could* be correct, in that the equality check has been just optimized completely away, but. Upvoted for the general approach and advice. The details aren't that terribly important, but nice to get those also clearly correct, not like a bit suspicious. – Cheers and hth. - Alf Jul 31 '18 at 21:37
@Cheersandhth.-Alf: Oh my mistake, apparently it's implementation-defined to use a constant like `'ATCG'` that's not only multiple bytes but multiple UTF8 *characters*. [Multi-character constant warnings](https://stackoverflow.com/q/7755202). I was just making assumptions based on the gcc warning in the godbolt link in my previous comment. But still the standard definitely doesn't guarantee support. (And I phrased it wrong, I should have said multi-character character constants.) – Peter Cordes Jul 31 '18 at 21:51
@PeterCordes: I think your terminology's better than g++'s. :) The characters here are all ASCII, single byte. When the execution character set is UTF-8 then Norwegian `'ø'` is an example of a multi-byte UTF-8 constant. The type is `int` instead of the apparent `char`. :( – Cheers and hth. - Alf Jul 31 '18 at 21:57
Uh oh, the code for `std::array` looks like an unrolled loop. But I have difficulty reading the AT&T syntax. It's all percent-signs to me. – Cheers and hth. - Alf Jul 31 '18 at 22:01
Assembly now generated. `std::array` looses. I'll let you guys decide if it beats a hand generated version. As I have no idea. I still stand by its a better idea to use the high level language. – Martin York Jul 31 '18 at 22:01
@MartinYork: Just for the record, that's LLVM-generated asm, not from GCC's back-end. I can tell from the label names like `LBB0_1` and non-labeled basic-block comments like `## BB#6:`. Gcc uses label names like `.L1` / `.L2` instead of `.LBB_`. I guess your Mac has clang installed as `g++`? – Peter Cordes Jul 31 '18 at 22:07
Also, you can use the Godbolt compiler explorer (http://godbolt.org/) to remove noise from gcc output (and make it easy to toggle between Intel / AT&T syntax, or add compiler options like `-O3`.) Including a Godbolt link along with asm output makes it easy for readers like @Cheersandhth.-Alf to flip it to Intel syntax, or play with it to see what source changes or optimization options do to the asm. – Peter Cordes Jul 31 '18 at 22:08
As far as what's optimal, these are all pointless because you built them without optimization. Of course coalescing into a single dword compare didn't happen. It really does only need two instructions to load one dword and `sub` the other. (If returning a 0 / non-zero `int` instead of a `bool` is an option). – Peter Cordes Jul 31 '18 at 22:10
Martin: I fixed your answer to show *optimized* compiler output from gcc/clang on Godbolt. It should match what you get from `g++ -O3` or `clang++ -O3` on your Mac desktop, the calling convention is the same. clang and gcc both inline memcmp (although only gcc7 and newer managed it, vs. even old clang 3.5 doing fine.) I kept the AT&T syntax, even though the question is clearly using Intel syntax, because this is your answer. I'd highly recommend showing the Intel-syntax output instead, though. Ping @Cheersandhth.-Alf in case you're interested. – Peter Cordes Jul 31 '18 at 22:29
@PeterCordes: I believe Xcode now uses clang as the backend compiler. But for backward compatibility gcc is an alias to clang (I think). – Martin York Jul 31 '18 at 22:41
Yup, that's what I've seen from other posts on SO, that g++ is an alias for clang on some MacOS setups. And clang has definitely been Apple's go-to compiler for a while; Apple funded LLVM/clang because of its more permissive non-GPL open-source license, and it's a very good compiler these days, better than gcc on some functions, but sometimes worse. I just wanted to point out that labeling it `g++` output was misleading. I mean yes that's what you got from the backwards-compat alias... – Peter Cordes Jul 31 '18 at 22:50

score 0 · Answer 4 · edited Jul 31 '18 at 20:34

0

something like this:

asm{

mov eax,'A'
mov ebx,'C'

 cmp eax,ebx
 JAE input_a
** here you print that 'A' <= 'C'  **
jump endofMain
input_a:
** here you print that 'A' >= 'C'  **
endofMain: 
}
return 0;

edited Jul 31 '18 at 20:34

Weather Vane

33,872
7
36
56

answered Jul 31 '18 at 20:33

HadiMa

1
1

thank you for your answer. how can I use mov such this: mov eax,b which b is an array of char I want to store whole data of b[4] array in eax register – Reza Behboodi Jul 31 '18 at 20:48
The OP wants MSVC inline-asm syntax for loading 4 bytes from a C `char` array, not for using mov-immediate inside the asm statement. – Peter Cordes Jul 31 '18 at 21:25

score -1 · Answer 5 · answered Jul 31 '18 at 21:51

-1

int main()
{
volatile char b[4],a[4];
b[0]='A';b[1]='T';b[2]='C';b[3]='G';
a[0]='A';a[1]='T';a[2]='C';a[3]='G';

uint32_t val;


__asm__("movl %0, %%eax;" : "=m" (a) : "m" (a));
__asm__ ( "subl %1, %%eax;" : "=a" (val) : "m" (b) );

printf("%s\n", !val ? "Equal" : "Not equal");

}

answered Jul 31 '18 at 21:51

0___________

60,014
4
34
74

Any comment if you DV – 0___________ Jul 31 '18 at 21:54
That's GNU C syntax (not the MSVC syntax the question is asking about), and you can't assume that `eax` is unmodified between the two `__asm__` statements. You also can't clobber the compiler's `eax` without telling it in the first statement. Put both those instructions in a single asm statement and both those problems go away. – Peter Cordes Jul 31 '18 at 21:54
The `sub` is a good idea, though, to produce a 0 / non-zero instead of a bool. – Peter Cordes Jul 31 '18 at 22:12

how to compare 32 bit char against 32 bit char in, inline assembely c++

5 Answers5

main.cpp

X.cpp

Z.cpp

Z.s

Y.cpp

Y.s