It is a somewhat practical question for developpers that are used with using multithreading for intensive calculations.
On a machine having a typical architecture with an Intel or AMD multicores processor, is it efficient to use multi-threading for repeating a simple calculus on a large area of memory ?
For instance, imagine that I want to increment a huge array of integers (or make some very simple operation on them) and share the workload between different threads having each its sub-array.
Depending on the number of cores of the processor and whether it is hyperthreaded or not, the machine can have a number N of simultanous threads. Can the speed of my calculus be multiplied by something close to N ? Or will a bottleneck in RAM access arises much sooner ?
A typical machine my company can rent has N = 40. But if the bottleneck arises for 5 threads, those machines won't be useful for our aim.
I know that theoretically, RAM access can be a bottleneck, but I would need practical experience feedback for the same kind of fast operations repeated on a large memory.