Few days ago I was writing some code and I had noticed that copying RAM by memcpy was much-much faster than copying it in for loop.
I got no measurements now (maybe I did some time later) but as I remember the same block of RAM which in for qas copied in about 300 ms or more by memcpy was copied in 20 ms or less.
It is possible, is memcpy hardware acelerated?