Questions tagged [numa]

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

As processors become faster and faster, proximity to memory increases in importance for overall computing performance. NUMA systems address this problem by building closer connections between specific computing resources and memory.

307 questions

votes

7 answers

Poor memcpy Performance on Linux

We have recently purchased some new servers and are experiencing poor memcpy() performance. The memcpy() performance is 3x slower on the servers compared to our laptops. Server Specs Chassis and Mobo: SUPER MICRO 1027GR-TRF CPU: 2x Intel Xeon…

asked Apr 01 '14 at 18:14

nick

votes

6 answers

Measuring NUMA (Non-Uniform Memory Access). No observable asymmetry. Why?

I've tried to measure the asymmetric memory access effects of NUMA, and failed. The Experiment Performed on an Intel Xeon X5570 @ 2.93GHz, 2 CPUs, 8 cores. On a thread pinned to core 0, I allocate an array x of size 10,000,000 bytes on core 0's NUMA…

c++ linux performance linux-kernel numa

asked Aug 31 '11 at 15:23

James Brock

3,236
1
28
33

votes

5 answers

How do I know if my server has NUMA?

Hopping from Java Garbage Collection, I came across JVM settings for NUMA. Curiously I wanted to check if my CentOS server has NUMA capabilities or not. Is there a *ix command or utility that could grab this info?

linux jvm kernel processor numa

asked Jun 20 '12 at 18:42

pr4n

2,918
3
30
42

votes

2 answers

Scalable allocation of large (8MB) memory regions on NUMA architectures

We are currently using a TBB flow graph in which a) a parallel filter processes an array (in parallel with offsets) and puts processed results into an intermediate vector (allocated on the heap; mostly the vector will grow up to 8MB). These vectors…

c++ memory-management parallel-processing tbb numa

asked Dec 10 '12 at 15:14

muehlbau

1,897
13
23

votes

5 answers

Too Low CPU Usage of Multithreaded Java Application on Windows

I am working on a Java application for solving a class of numerical optimization problems - large-scale linear programming problems to be more precise. A single problem can be split up into smaller subproblems that can solved in parallel. Since…

java multithreading java-native-interface jvm-hotspot numa

asked Nov 14 '19 at 20:29

Nils

votes

1 answer

How are MMIO, IO and PCI configuration request routed and handled by the OS in a NUMA system?

TL;DR How are MMIO, IO and PCI configuration requests routed to the right node in a NUMA system? Each node has a "routing table" but I'm under the impression that the OS is supposed to be unaware of it. How can an OS remap devices if it cannot…

io x86 cpu-architecture numa

asked Jul 30 '19 at 18:31

Margaret Bloom

41,768
5
78
124

votes

1 answer

How does -XX:+UseNUMA affects JVM performance for systems with only one node?

There is a numerous articles regarding benefits of JVM NUMA-aware allocators. However I could not find information about what performance impact may cause -XX:+UseNUMA flag for single-node topologies like # numactl --hardware available: 1 nodes…

java performance jvm numa

asked Sep 13 '16 at 11:56

vsminkov

10,912
2
38
50

votes

4 answers

Can I get the NUMA node from a pointer address (in C on Linux)?

I've set up my code to carefully load and process data locally on my NUMA system. I think. That is, for debugging purposes I'd really like to be able to use the pointer addresses being accessed inside a particular function, which have been set up…

c linux multithreading memory numa

asked Nov 02 '11 at 20:27

Rob_before_edits

1,163
9
13

votes

3 answers

Why is my .Net app only using single NUMA node?

I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows. In my app I am starting 64 threads, using Thread.Start().…

c# .net multithreading numa

asked Nov 09 '14 at 16:13

datadev

votes

1 answer

Shared Library bottleneck on NUMA machine

I'm using a NUMA machine (an SGI UV 1000) to run a large number of numerical simulations at the same time, each of which is an OpenMP job using 4 cores. However, running more than around 100 of these jobs results in a significant performance hit.…

linux linux-kernel shared-libraries hpc numa

asked Sep 12 '12 at 13:08

acroz

votes

1 answer

How does _mm_mwait work?

How does _mm_mwait from pmmintrin.h work? (I mean not the asm for it, but action and how this action is taken in NUMA systems. The store monitoring is easy to implement only on bus-based SMP systems with snooping of bus.) What processors does…

atomic intrinsics numa sse3

asked Apr 02 '10 at 02:23

osgx

90,338
53
357
513

votes

2 answers

Multithreading: Why two programs is better than one?

Shortly about my problem: I have a computer with 2 sockets of AMD Opteron 6272 and 64GB RAM. I run one multithread program on all 32 cores and get speed 15% less in comparison with the case when I run 2 programs, each on one 16 cores socket. How do…

c++ multithreading pthreads numa

asked Nov 13 '13 at 09:35

klm123

12,105
14
57
95

votes

2 answers

How does NUMA architecture affect the performance of ActivePivot?

We are migrating an ActivePivot application to a new server (4 sockets Intel Xeon, 512GB of memory). After deploying we launched our application benchmark (that's a mix of large OLAP queries concurrent to real-time transactions). The measured…

java olap numa activepivot

asked Oct 31 '12 at 14:40

Jack

votes

2 answers

NUMA aware cache aligned memory allocation

In linux systems, pthreads library provides us a function (posix_memalign) for cache alignment to prevent false sharing. And to choose a specific NUMA node of the arhitecture we can use libnuma library. What I want is something needing both two. I…

linux caching pthreads malloc numa

asked Nov 16 '11 at 15:29

Mustafa Zengin

2,885
5
21
24

votes

2 answers

do malloc/memcpy function run independently on NUMA?

While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc() because even in multi-core machines it is shared/synch between all the cores. I have available a PC with…

c memory malloc memcpy numa

asked Mar 29 '11 at 10:21

Abruzzo Forte e Gentile

14,423
28
99
173

2 3

…

20 21 Next