Questions tagged [cache-locality]

25 questions
17
votes
4 answers

Confused between Temporal and Spatial locality in real life code

I was reading this question, I wanted to ask more about the code that he showed i.e for(i = 0; i < 20; i++) for(j = 0; j < 10; j++) a[i] = a[i]*j; The questions are, I understand temporal locality, I think that references to i and j…
user379888
6
votes
2 answers

Does _mm_clflush really flush the cache?

I'm trying to understand how the hardware cache works by writing and running a test program: #include #include #include #define LINE_SIZE 64 #define L1_WAYS 8 #define L1_SETS 64 #define L1_LINES …
xiaogw
  • 653
  • 8
  • 18
5
votes
2 answers

CPU spatial cache locality in array iteration

My understanding of the L1 cache was that a memory fetch loads a cache line. Assuming the cache line size is 64 bytes, if I access memory at address p, it will load the entire block from p to p + 64 into the cache. Thus, it is best to iterate…
user1413793
  • 9,057
  • 7
  • 30
  • 42
3
votes
1 answer

Why is inserting sorted keys into std::set so much faster than inserting shuffled keys?

I was accidentally surprised to found that inserting sorted keys into std::set is much much faster than inserting shuffled keys. This is somewhat counterintuitive since a red-black tree (I verified that std::set is implemented as a red-black tree on…
Leon Cruz
  • 345
  • 3
  • 11
3
votes
2 answers

Understanding spatial and temporal locality

I was studying for my architecture final and came across the following lines of code: for(i = 0; i <= N ;i++){ a[i] = b[i] + c[i]; } The question is: "How does this code snippet demonstrate examples of temporal and spatial locality? Be sure to…
Carlos Romero
  • 181
  • 1
  • 13
2
votes
2 answers

Cache misses when accessing an array in nested loop

So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1. Assume that the code below is a valid compilable C code. Vector2: void incrementVector2(INT4* v, int n) { for (int…
Pengibaby
  • 373
  • 2
  • 11
2
votes
0 answers

In Apache Spark, how to make a task to always execute on the same machine?

In its simplest form, RDD is merely a placeholder of chained computations that can be arbitrarily scheduled to be executed on any machine: val src = sc.parallelize(0 to 1000) val rdd = src.mapPartitions { itr => …
tribbloid
  • 4,026
  • 14
  • 64
  • 103
2
votes
2 answers

Importance of padding in Dynamic Memory Allocation

I am trying to implement a heap (implicit free list with header/footer) and deciding on whether I should add padding to it. What are the tangible benefits of adding pads? I read that it somehow reduces fragmentation, but I don't really understand…
2
votes
0 answers

How to benchmark random access with JMH?

I was trying to observe the effects of CPU cache spatial locality by benchmarking sequential/random reads to an array with JMH. Interestingly, the results are almost the same. So I wonder, is this the correct JMH approach? Below is the test class…
pistolPanties
  • 1,880
  • 13
  • 18
2
votes
1 answer

Why can't modern compilers optimize row-major order accesses in loops?

In the textbook Computer Systems: a Programmer's Perspective there are some impressive benchmarks for optimizing row-major order access. I created a small program to test for myself if a simple change from row-major access to column-major access…
1
vote
0 answers

Improving cache locality of binary search by doing local linear search?

Binary search of a sorted array may have poor cache locality, due to random access of memory, but linear search is slow for a large array. Is it possible to design a hybrid algorithm? For example, you could divide a sorted array of length 100 into…
felix
  • 647
  • 1
  • 6
  • 10
1
vote
0 answers

Detect whether a cache line is reused due to spatial or temporal locality

Is there a practical tool to detect whether a cache line is reused (a cache miss is avoided) due to either spatial or temporal locality? I could not find a related discussion in cachegrind. I was able to find only this and this papers but not the…
Kadir
  • 1,345
  • 3
  • 15
  • 25
1
vote
0 answers

java mechanical sympathy trough thread pinning

Given we have an application that is heavily polluted with concurrency constructs multiple techniques are used (different people worked without clear architecture in mind), multiple questionable locks that are there "just in case", thread safe…
vach
  • 10,571
  • 12
  • 68
  • 106
1
vote
5 answers

data locality for implementing 2d array in c/c++

Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays). double **allocate_matrix(int NumRows, int NumCol) { double **x; int i; x = (double **)malloc(NumRows *…
John Smith
  • 1,027
  • 15
  • 31
0
votes
2 answers

Do different ways of initializing a Vec create different memory layout?

fn main() { let vec0 = vec![0; 10]; let mut vec1 = vec![]; for _ in 0..10 { vec1.push(0); } assert_eq!(vec0.len(), vec1.len()); } In this example, vec0 and vec1 are identical when accessing their items with index. But…
Rahn
  • 4,787
  • 4
  • 31
  • 57
1
2