0

In one of the Assembly classes(I am using Assembly x86_64 - intel flavor, with NASM assembler, Linux) which I took last year, the lecturer has mentioned one Assembly instruction that does basically what the function memcpy(3) does just with one single instruction.

For some unknown reasons, I can't remember or find(even I was looking after it for a very long time) that instruction, and I need to use it in my current project. It is extremely crucial to my project.

I would love to get some help with that.

Thanks all.

(BTW: The function memcpy(3) is the function void *memcpy(void *dest, const void *src, size_t n); declarated in the header string.h in libc.

  • 3
    `rep movsb` (or `rep movsw`, `rep movsd`, `rep movsq`) – ecm Aug 30 '20 at 14:47
  • you have to set up the length which takes another instruction at least plus the setup of the addresses, so it is not a single instruction solution. – old_timer Aug 30 '20 at 15:23
  • you can disassemble code from a particular C library (no reason to assume any two C libraries use the same implementation) and see how they do it they likely wont use rep movsb as it is not efficient it may be many lines of code to optimize performance. – old_timer Aug 30 '20 at 15:25
  • @ecm Great answer. Thank u very much. – J_Developer 2000 Aug 30 '20 at 15:29
  • @old_timer Thank you very much. Would you like to explain why ```rep movsb``` is not efficient please? It seems to me like a ```rcx``` regular loop that copies from address to address. I think that is the most efficient and native solution(if not please correct me). – J_Developer 2000 Aug 30 '20 at 16:26
  • 1
    byte transfers are not as efficient as word or double word or quad word, the busses are not byte sized, so it causes read-modify-writes at some layer, which you want to avoid. the sram in your cache is likely 32 or 64 bits wide, so each byte write causes a read-modify-write in the cache. but if you do a 32 or 64 bit transfer then it is simply a write. Or even if a 32 bit transfer against a 64 bit wide cache sram, that is 4 times fewer read-modify-writes than 8 bit transfers, then you add bus overhead for each transfer, etc. 4 times more bus transfers, 4 times the microcode executed. etc – old_timer Aug 30 '20 at 16:29
  • @old_timer Thank you very very much ! That's what I thought you mean. – J_Developer 2000 Aug 30 '20 at 16:58

0 Answers0