I am going to say that doing it in assembly is a bad idea.
You should be using high level language constructs. This will allow the code to be portable and when push comes to shove the compiler will beat "most" humans at any peephole optimization like this.
So I checked the output of g++ to see what assembly it generated.
main.cpp
#include <array>
#include <iostream>
bool testX(int a, int b);
bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b);
bool testZ(char const(&a)[4], char const(&b)[4]);
int main()
{
{
int a = 'ATCG';
int b = 'ATCG';
if (testX(a, b)) {
std::cout << "Equal\n";
}
}
{
std::array<char, 4> a {'A', 'T', 'C', 'G'};
std::array<char, 4> b {'A', 'T', 'C', 'G'};
if (testY(a, b)) {
std::cout << "Equal\n";
}
}
{
char a[] = {'A', 'T', 'C', 'G'};
char b[] = {'A', 'T', 'C', 'G'};
if (testZ(a, b)) {
std::cout << "Equal\n";
}
}
}
With optimization enabled, we get nice asm from clang, and usually from recent gcc on the Godbolt compiler explorer. (The main
above would optimize away the compares if the functions can inline, because the inputs are compile-time constants.)
X.cpp
bool testX(int a, int b)
{
return a == b;
}
# gcc and clang -O3 asm output
testX(int, int):
cmpl %esi, %edi
sete %al
ret
Z.cpp
#include <cstring>
bool testZ(char const(&a)[4], char const(&b)[4])
{
return std::memcmp(a, b, sizeof(a)) == 0;
}
Z.s
# clang, and gcc7 and newer, -O3
testZ(char const (&) [4], char const (&) [4]):
movl (%rdi), %eax
cmpl (%rsi), %eax
sete %al
retq
Y.cpp
#include <array>
bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b)
{
return a == b;
}
Y.s
# only clang does this. gcc8.2 actually calls memcmp with a constant 4-byte size
testY(std::array<char, 4ul> const&, std::array<char, 4ul> const&):
movl (%rdi), %eax
cmpl (%rsi), %eax
sete %al
retq
So std::array and memcmp for comparing 4-byte objects both produce identical code with clang, but with gcc only memcmp
optimizes well.
Of course, the stand-alone version of the function has to actually produce a 0 / 1 integer, instead of just setting flags for a jcc
to branch on directly. The caller of these functions will have to test %eax,%eax
before branching. But if the compiler can inline these functions, that overhead goes away.