18

While going through some C code having inline assembly I came across the .byte (with a Dot at the beginning) directive.

On checking the assembly reference on web I found that it is used to reserve a byte in memory.

But in the code there was no label before the statement. So I was wondering what is use of an unlabeled .byte directive or any other data storage directive for that matter.

For e.g. if i code .byte 0x0a, how can i use it ?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
vjain27
  • 3,514
  • 9
  • 41
  • 60
  • `.byte 0x0a` specifically can be used as the new line character (it's exactly the same as typing `\n` inside one of your strings, as that sequence is transformed into `0x0A` by the assembler. I'm pretty sure the GNU assembler supports `\n` but not all assemblers do. – puppydrum64 Dec 07 '22 at 18:25

4 Answers4

11

There are a few possibilities... here are a couple I can think of off the top of my head:

  1. You could access it relative to a label that comes after the .byte directive. Example:

      .byte 0x0a
    label:
      mov (label - 1), %eax
    
  2. Based on the final linked layout of the program, maybe the .byte directives will get executed as code. Normally you'd have a label in this case too, though...

  3. Some assemblers don't support generating x86 instruction prefixes for operand size, etc. In code written for those assemblers, you'll often see something like:

      .byte 0x66
      mov $12, %eax
    

    To make the assembler emit the code you want to have.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 2
    That assembler from 3) needs a patch, urgently :-) – Ciro Santilli OurBigBook.com Oct 27 '15 at 06:19
  • 1
    What is the difference between `.byte` and [`d*` pseudo-ops](http://stackoverflow.com/questions/10168743/x86-assembly-which-variable-size-to-use-db-dw-dd)? – Константин Ван Jan 21 '17 at 13:19
  • I expect they're the same. – Carl Norum Jan 21 '17 at 22:59
  • `.byte 0x66` is length-changing for `mov $12, %eax`, so your example will decode as `mov $12, %ax` / `add %al,(%rax)` or `(%eax)` in 32-bit mode. The 2nd instruction is the left-over `00 00` bytes of the 32-bit immediate that `mov eax, imm16` doesn't consume. This is tricky for the hardware, too, and causes LCP pre-decode stalls on Intel CPUs (LCP = length-changing prefix). – Peter Cordes Oct 01 '17 at 20:14
  • @CiroSantilliOurBigBook.com I ran into this problem when assembling Game Boy code and I didn't know the "short form" of `LD $FFnn,a` so I just made a macro that inlined the bytecode. – puppydrum64 Dec 07 '22 at 18:27
7

Minimal runnable example

.byte spits out bytes wherever you are. Whether there is a label or not pointing to the byte, does not matter.

If you happen to be in the text segment, then that byte might get run like code.

Carl mentioned it, but here is a complete example to let it sink in further: a Linux x86_64 implementation of true with a nop thrown in:

.global _start
_start:
    mov $60, %rax
    nop
    mov $0, %rdi
    syscall

produces the exact same executable as:

.global _start
_start:
    mov $60, %rax
    .byte 0x90
    mov $0, %rdi
    syscall

since nop is encoded as the byte 0x90.

One use case: new instructions

One use case is when new instructions are added to a CPU ISA, but only very edge versions of the assembler would support it.

So project maintainers may choose to inline the bytes directly to make it compilable on older assemblers.

See for example this Spectre workaround on the Linux kernel with the analogous .inst directive: https://github.com/torvalds/linux/blob/94710cac0ef4ee177a63b5227664b38c95bbf703/arch/arm/include/asm/barrier.h#L23

#define CSDB    ".inst  0xe320f014"

A new instruction was added for Spectre, and the kernel decided to hardcode it for the time being.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
4

Here's an example with inline assembly:

#include <stdio.h>
void main() {
   int dst;
   // .byte 0xb8 0x01 0x00 0x00 0x00 = mov $1, %%eax
   asm (".byte 0xb8, 0x01, 0x00, 0x00, 0x00\n\t"
    "mov %%eax, %0"
    : "=r" (dst)
    : : "eax"  // tell the compiler we clobber eax
   );
   printf ("dst value : %d\n", dst);
return;
}

(See compiler asm output and also disassembly of the final binary on the Godbolt compiler explorer.)

You can replace .byte 0xb8, 0x01, 0x00, 0x00, 0x00 with mov $1, %%eax the run result will be the same. This indicated that it can be a byte which can represent some instruction eg- move or others.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user3380830
  • 101
  • 2
1

The .byte is a directive that allows you to declare a constant byte only known through inspection without any context.

From the GNU Assembler Guide:

.byte  74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
Daniel Pereira
  • 1,785
  • 12
  • 10