Questions tagged [numa]

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

As processors become faster and faster, proximity to memory increases in importance for overall computing performance. NUMA systems address this problem by building closer connections between specific computing resources and memory.

307 questions
77
votes
7 answers

Poor memcpy Performance on Linux

We have recently purchased some new servers and are experiencing poor memcpy() performance. The memcpy() performance is 3x slower on the servers compared to our laptops. Server Specs Chassis and Mobo: SUPER MICRO 1027GR-TRF CPU: 2x Intel Xeon…
nick
  • 513
  • 1
  • 8
  • 12
35
votes
6 answers

Measuring NUMA (Non-Uniform Memory Access). No observable asymmetry. Why?

I've tried to measure the asymmetric memory access effects of NUMA, and failed. The Experiment Performed on an Intel Xeon X5570 @ 2.93GHz, 2 CPUs, 8 cores. On a thread pinned to core 0, I allocate an array x of size 10,000,000 bytes on core 0's NUMA…
James Brock
  • 3,236
  • 1
  • 28
  • 33
30
votes
5 answers

How do I know if my server has NUMA?

Hopping from Java Garbage Collection, I came across JVM settings for NUMA. Curiously I wanted to check if my CentOS server has NUMA capabilities or not. Is there a *ix command or utility that could grab this info?
pr4n
  • 2,918
  • 3
  • 30
  • 42
23
votes
2 answers

Scalable allocation of large (8MB) memory regions on NUMA architectures

We are currently using a TBB flow graph in which a) a parallel filter processes an array (in parallel with offsets) and puts processed results into an intermediate vector (allocated on the heap; mostly the vector will grow up to 8MB). These vectors…
muehlbau
  • 1,897
  • 13
  • 23
18
votes
5 answers

Too Low CPU Usage of Multithreaded Java Application on Windows

I am working on a Java application for solving a class of numerical optimization problems - large-scale linear programming problems to be more precise. A single problem can be split up into smaller subproblems that can solved in parallel. Since…
Nils
  • 818
  • 7
  • 16
16
votes
1 answer

How are MMIO, IO and PCI configuration request routed and handled by the OS in a NUMA system?

TL;DR How are MMIO, IO and PCI configuration requests routed to the right node in a NUMA system? Each node has a "routing table" but I'm under the impression that the OS is supposed to be unaware of it. How can an OS remap devices if it cannot…
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
16
votes
1 answer

How does -XX:+UseNUMA affects JVM performance for systems with only one node?

There is a numerous articles regarding benefits of JVM NUMA-aware allocators. However I could not find information about what performance impact may cause -XX:+UseNUMA flag for single-node topologies like # numactl --hardware available: 1 nodes…
vsminkov
  • 10,912
  • 2
  • 38
  • 50
14
votes
4 answers

Can I get the NUMA node from a pointer address (in C on Linux)?

I've set up my code to carefully load and process data locally on my NUMA system. I think. That is, for debugging purposes I'd really like to be able to use the pointer addresses being accessed inside a particular function, which have been set up…
Rob_before_edits
  • 1,163
  • 9
  • 13
14
votes
3 answers

Why is my .Net app only using single NUMA node?

I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows. In my app I am starting 64 threads, using Thread.Start().…
datadev
  • 311
  • 2
  • 5
  • 11
14
votes
1 answer

Shared Library bottleneck on NUMA machine

I'm using a NUMA machine (an SGI UV 1000) to run a large number of numerical simulations at the same time, each of which is an OpenMP job using 4 cores. However, running more than around 100 of these jobs results in a significant performance hit.…
acroz
  • 165
  • 10
12
votes
1 answer

How does _mm_mwait work?

How does _mm_mwait from pmmintrin.h work? (I mean not the asm for it, but action and how this action is taken in NUMA systems. The store monitoring is easy to implement only on bus-based SMP systems with snooping of bus.) What processors does…
osgx
  • 90,338
  • 53
  • 357
  • 513
12
votes
2 answers

Multithreading: Why two programs is better than one?

Shortly about my problem: I have a computer with 2 sockets of AMD Opteron 6272 and 64GB RAM. I run one multithread program on all 32 cores and get speed 15% less in comparison with the case when I run 2 programs, each on one 16 cores socket. How do…
klm123
  • 12,105
  • 14
  • 57
  • 95
12
votes
2 answers

How does NUMA architecture affect the performance of ActivePivot?

We are migrating an ActivePivot application to a new server (4 sockets Intel Xeon, 512GB of memory). After deploying we launched our application benchmark (that's a mix of large OLAP queries concurrent to real-time transactions). The measured…
Jack
  • 145
  • 1
  • 1
  • 11
11
votes
2 answers

NUMA aware cache aligned memory allocation

In linux systems, pthreads library provides us a function (posix_memalign) for cache alignment to prevent false sharing. And to choose a specific NUMA node of the arhitecture we can use libnuma library. What I want is something needing both two. I…
Mustafa Zengin
  • 2,885
  • 5
  • 21
  • 24
11
votes
2 answers

do malloc/memcpy function run independently on NUMA?

While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc() because even in multi-core machines it is shared/synch between all the cores. I have available a PC with…
Abruzzo Forte e Gentile
  • 14,423
  • 28
  • 99
  • 173
1
2 3
20 21