While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc()
because even in multi-core machines it is shared/synch between all the cores.
I have available a PC with NUMA architecture using Linux and C and I have two questions:
- In a NUMA machine, since each core is provided with its own memory, will
malloc()
execute independently on each core/memory without blocking the other cores? - In these architectures how are the calls to
memcpy()
made? Can this be called independently on each core or, calling it in once core will block the others? I maybe wrong but I remember that alsomemcpy()
got the same problem ofmalloc()
i.e. when one core is using it the others have to wait.