1

I know little much of assembly(NASM), i wanted to perform string operation(substring present or not) using SSE4.2. So i learnt how PCMPESTRI, PCMPISTRM works. I am stuck in the middle i.e data transfer from memory to xmm register. Basically, I wanted to take input via command line (eg: ./a.out ABCD) and transfer to a xmm1 register. Taking input via command line could be of any length string i.e(1 - more than 16), and command line data is stored with appended by 0(i.e ABCD\0) and we get its starting address which is present in stack. So how do i make command line data align to 16 bytes (ABCD\0\0\0\0... Upto 16) ?

Also i don't want to allocate memory using brk system call and copy all the comandline data to it and then transfer to xmm1 register.(Beacuse i wanted to achieve substring check in just one go instead of moving all the data to newly allocated memory and then copy every contents.... which may increase execution time)

I tried to do this:-

section .data
align 16 ; I thought that command line data is stored in data section and may align to 16. :-(
 ...

section .bss
...
section .text
...

But it didn't worked.. So how do i achieve to transfer data to xmm register by considering input could be of varible of length (1 - more than 16)

which move instruction should i use?

How should i solve this data movement where input will be from command line and it can be of any length..?

My CPU info flags(/proc/cpuinfo) is: sse sse2 ssse2 sse4_1 sse4_2

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sanket
  • 65
  • 1
  • 5

1 Answers1

3

Command line args are on the stack, not in .data. Aligning .data is totally irrelevant.

Related: Is it safe to read past the end of a buffer within the same page on x86 and x64?. You don't align your buffer, you just check that a 16-byte load won't cross into a new page (i.e. that ptr & 4095 <= (4096-16)).

If you don't know that, you can't safely use movdqu and have to fall back to another strategy. (Like maybe a 16-byte load that loads the last 16 bytes of the page, and maybe look up a pshufb control vector from a sliding window of db 0,1,2,3,4,...,-1,-1,-1 that will shuffle the bytes you actually want to the bottom of an XMM register).

Processing unaligned implicit-length strings with SIMD is generally inconvenient because the semantics of what's safe to read depend on looking one byte at a time. (Except for taking advantage of the fact that memory protection has page granularity).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I m actually newbie..., I read the link(Related: Is it safe to read past...) multiple times but didn't understood much. Further, I wrote a code to display 200 characters, it printed command Line data + garbage , I didn't got segmentation fault error. ` mov rax, 1` – Sanket Mar 11 '20 at 13:52
  • (read system call) ; mov rdi,1 ; mov rsi,qword[rsp + 16] (address of command line) ; mov rdx,200 (length to display) ; syscall` after running `./a.out sanket` it printed as:- sanketsdCLUTTER_IM_MODULE=ximLS_COLORS=rs...... I did't check the boundary. I need only 16 byte, So now is it ok to read directly... – Sanket Mar 11 '20 at 14:11
  • I think it is the same analogy as:- int arr[5]; here, array with 5 element is declared but we can access arr[15] without getting segmentation fault. Is this correct what I am thinking?? Thank you sooo much in advance. – Sanket Mar 11 '20 at 14:17
  • 1
    Yes, it's safe unless your process is run with an empty `envp[]` so your first arg is right at the very top of your stack mapping. i.e. there's an unmapped page with a few bytes past the end of the string pointed to by `argv[1]`. It's not *always* safe to read `arr[15]` unless you know `arr` is 16-byte aligned. And BTW, from _start `[rsp+16]` is not the address of your whole command line, just your first arg. Maybe you're used to Windows where the OS only passes a flat string? In POSIX, the caller passes separate args separately, as an array of strings. – Peter Cordes Mar 11 '20 at 16:46