0

I want to concatenate two bytes char byte1 and char byte2 into a single short in Assembly.

How can I do it? Using shifts?

I'm working with IA32

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    If you're using x86 (16 bit, 32 bit, or 64 bit) assembly then you can use partial registers to achieve this. For example, `al` is the lowest byte of the `ax`/`eax`/`rax` register, and `ah` is the second lowest byte. `ax` is made up of the parts `al` and `ah`. – ecm Oct 31 '20 at 17:14
  • 1
    Isn't `ah` the most significant part of `eax`? – José Soares Oct 31 '20 at 17:21
  • 1
    No, it is the most significant part (half) of `ax`, for historical reasons. The upper 2 bytes of `eax` do not have a specific name. – ecm Oct 31 '20 at 17:22
  • Can you please give me an example of two `char byte` 's and a single `short` that is the result of the concatenate? – José Soares Oct 31 '20 at 17:25
  • 1
    Please specify an architecture when using the assembly tag, as answers vary dramatically, e.g. for x86 vs. MIPS or RISC V. – Erik Eidt Oct 31 '20 at 17:28
  • 1
    Yes, I forgot that. Thank you for advicing me. My architecture is IA32 – José Soares Oct 31 '20 at 17:33
  • Where are your two bytes located, and where do you want to put the result (registers, memory, ...?) – Nate Eldredge Oct 31 '20 at 17:35
  • I have my main.c file, my concatBytes.h and my concatBytes.s My main.c looks like this: ` #include #include "concatBytes.h" char byte1 = '11101010'; char byte2 = '10100001'; int main(){ short result = 0; result = concatBytes(); printf("Result = %hd",result); return 0; } ` concatBytes.h: ` short concatBytes(void); ` I didn't solve my concatBytes.s – José Soares Oct 31 '20 at 17:43
  • how you you wish to concatenate them 0xAA and 0xBB into 0xAABB? or 0xBBAA, sort that out, but yes you shift just like in C x = (((unsigned short)byte1)<<8)|byte2; – old_timer Oct 31 '20 at 17:59
  • or you do the ah, al thing – old_timer Oct 31 '20 at 17:59
  • but if you are making this a function that you are calling from C then you also need to deal with the calling convention and where the bytes are when it hits the assembly language. you have global character strings so you also need to take the string and convert it to binary and then combine into a single register or memory location and return based on the calling convention of the compiler. the concat of two bytes is the extremely trivial portion of the task. show the rest of your code, start by writing the whole thing in C (using no c library calls) then port that to asm. – old_timer Oct 31 '20 at 18:02
  • are you required to validate each character in the string as either being '0' or '1' and return an error if not or can you assume it is a valid and complete string with no more than 8 characters? – old_timer Oct 31 '20 at 18:06
  • (well first make a valid C program, then deal with is this a string or not, etc) – old_timer Oct 31 '20 at 18:06
  • if not a string you need to collect the two global bytes and then put them somewhere together – old_timer Oct 31 '20 at 18:07
  • No, I don't have to validate them – José Soares Oct 31 '20 at 18:59
  • @old_timer how can do just like in C : x = (((unsigned short)byte1)<<8)|byte2 but in assembly? – José Soares Oct 31 '20 at 19:15
  • 1
    @JoséSoares How much assembly do you already know? Try using the `shl` and `or` instructions. Alternatively, as others already said, move one byte into `al` and the other into `ah`. Then, the result obtains in `ax`. (`bx`, `cx`, and `dx` can be used equivalently). – fuz Oct 31 '20 at 19:34
  • @JoséSoares you look at the instruction set in the documentation that you downloaded long before starting any kind of assembly language work. You look through the instructions and find ones that can perform those functions, you read how to use them, then you use, them. Just like any other language. – old_timer Oct 31 '20 at 22:38
  • 1
    I' m in college @fuz , I just started with Assembly. But I already solved this issue. Thank you everyone. – José Soares Oct 31 '20 at 23:08
  • @JoséSoares Cool! Keep learning, you'll eventually master it. – fuz Nov 01 '20 at 09:26

2 Answers2

1

I just solved the problem and did this in case somebody has the same problem:

concatBytes.s :

 .section .data

.global byte1
.global byte2

.section .text
.global concatBytes


concatBytes:

#prologue 
    pushl %ebp 
    movl %esp, %ebp
    
    pushl %ebx
    
#body of the function

    movl $0, %eax
    movb byte1, %al
    movb byte2, %ah
    

    
#epilogue

    popl %ebx
    
    movl %ebp, %esp
    popl %ebp
    ret

  • That works, but there's no point in `push`/`pop` of EBX. (Or for that matter, setting up EBP as a frame pointer, since you're not using it.) Also, you could `movzbl byte1, %eax` instead of zeroing EAX and writing AL. – Peter Cordes Oct 31 '20 at 23:25
  • Also, you don't need `.global byte1` and 2 in this file; you're not *defining* those symbols, just referencing an existing symbol defined in a C file. If anything you'd need `.extern byte1, byte2`, but GAS doesn't need that; unknown symbols are already assumed to be external. (In NASM you *would* need `extern` declarations.) Putting those directives in `.section .data` is also pointless. You *do* need the `.global concatBytes` so compiler-generated code that references that extern-for-it symbol can reference the definition in your asm, though. `.global` is for *exporting*, not importing. – Peter Cordes Nov 01 '20 at 03:40
  • Oh, nice! I dind't know I dont need to `push`/`pop` EBX. My teacher just told me if I used EBX ESI or EDI I had to `push`/`pop`. But, in fact, I didn't use so I think i can remove it! Thank you everyone. – José Soares Nov 01 '20 at 14:54
  • 1
    Yes, exactly. Your teacher's description is correct because those registers are "call preserved" in the standard calling convention, and you don't touch EBX. (Fun fact: this applies to EBP as well; you don't *need* to bother setting it up as a frame pointer, and in this function you don't even access any stack memory except via push/pop. An optimizing compiler wouldn't touch EBP either in this function, if you wrote it in C and compiled on https://godbolt.org/. See also [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116).) – Peter Cordes Nov 01 '20 at 21:50
0

main.c

#include <stdio.h> 
#include "concatBytes.h" 
char byte1 = '11101010'; 
char byte2 = '10100001'; 
int main()
{ 
    short result = 0; 
    result = concatBytes(); 
    printf("Result = %hd",result); 
    return 0;
}

concatBytes.h

short concatBytes(void);

concatBytes.c

extern char byte1; 
extern char byte2;
short concatBytes(void)
{
    return(((short)byte1)^((short)byte2));
}

so obviously:

gcc main.c concatBytes.c -o main
main.c:3:14: warning: character constant too long for its type
 char byte1 = '11101010'; 
              ^
main.c:3:14: warning: overflow in implicit constant conversion [-Woverflow]
main.c:4:14: warning: character constant too long for its type
 char byte2 = '10100001'; 
              ^
main.c:4:14: warning: overflow in implicit constant conversion [-Woverflow]

that is bad syntax. so that leads to the question did you mean this:

#include <stdio.h> 
#include "concatBytes.h" 
char byte1[] = "11101010"; 
char byte2[] = "10100001"; 
int main()
{ 
    short result = 0; 
    result = concatBytes(); 
    printf("Result = %hd",result); 
    return 0;
}

or this:

#include <stdio.h> 
#include "concatBytes.h" 
char byte1 = 0xEA; 
char byte2 = 0xA1; 
int main()
{ 
    short result = 0; 
    result = concatBytes(); 
    printf("Result = %hd",result); 
    return 0;
}

assuming the latter:

0000000000400559 <concatBytes>:
  400559:   55                      push   %rbp
  40055a:   48 89 e5                mov    %rsp,%rbp
  40055d:   0f b6 15 d4 0a 20 00    movzbl 0x200ad4(%rip),%edx        # 601038 <byte1>
  400564:   0f b6 05 ce 0a 20 00    movzbl 0x200ace(%rip),%eax        # 601039 <byte2>
  40056b:   31 d0                   xor    %edx,%eax
  40056d:   66 98                   cbtw   
  40056f:   5d                      pop    %rbp
  400570:   c3                      retq   

which gives you a rough idea of the calling convention and then you simply replace the middle of that code with your "concatenate"

If it is a string then you first need to convert each to a byte, then concatenate. You can just as easily figure that one out...

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • allthough the odds are high for x86, dont assume that the calling convention used in my compiler matches that of yours, or any other. Likewise x86-32 vs x86-64 which you are most likely using the latter along with the rest of us. But dont assume what you see here will work in your setup, work your way into the asm then back out (whether you are mixing with C or some other environment). – old_timer Oct 31 '20 at 18:44
  • XOR doesn't concatenate bytes. You want zero-extending loads then shift/OR. (Unlike ARM, `char` = `signed char` for x86 ABIs; that's why it's sign-extending. Apparently GCC "optimized" to sign-extend the XOR result, instead of doing sign-extending loads.) – Peter Cordes Oct 31 '20 at 23:29
  • @PeterCordes of course it doesnt the whole point here was not to do the homework assignment for the OP. As stated: then you simply replace the middle of that code with your "concatenate" – old_timer Oct 31 '20 at 23:45
  • Ok, that sentence didn't jump out at me when skimming through this answer. It's pretty buried and comes long after your definition of the misleadingly-named `concatBytes`. Perhaps call it `bytefunc` or `examplefunc` if you're not going to make it actually concat. Also, I don't see a lot of value in showing wrong C syntax for `char` constants. You could just use C++ `0b11101010`, which IIRC is supported as an extension by some C compilers. The OP doesn't say whether they're using C or C++ anyway, and C++ has `char` and `short` types, too. In fact so do some other languages... – Peter Cordes Oct 31 '20 at 23:55