3

Trying to understand how pointer chasing is working (basically the access pattern) in the following benchmark:

https://github.com/google/multichase/blob/master/multichase.c

For the simple chase:

static void chase_simple(per_thread_t *t) {
  void *p = t->x.cycle[0]; ----> here p is pointing to initial address 

  do {
    x200(p = *(void **)p;) ----> didn't get this statement, is x200 some compiler built-in ?
  } while (__sync_add_and_fetch(&t->x.count, 200));

  // we never actually reach here, but the compiler doesn't know that
  t->x.dummy = (uintptr_t)p;
}

where and how in the code, access pattern is made somewhat unpredictable so that H/W prefetcher doesn't guess it properly?

Milan
  • 1,447
  • 5
  • 19
  • 27
  • 1
    `x200` must be defined in one of the included files (directly or indirectly through nested inludes). – Eric Postpischil Aug 29 '22 at 18:29
  • @EricPostpischil, does it mean of some geometric calculation, I mean the product of values? – Milan Aug 29 '22 at 18:33
  • It is defined in one of the included files. To find out what it does, find its definition or its documentation. – Eric Postpischil Aug 29 '22 at 18:44
  • @EricPostpischil: I'd assume it's a CPP macro that repeats its operand 200 times, for explicit loop unrolling, not leaving it up to the compiler. Read `x200` as "times 200". That would make sense to amortize the cost of the atomic RMW to increment `t->x.count`. @Milan: You can check by looking at C preprocessor output, e.g. `gcc -E` – Peter Cordes Aug 29 '22 at 19:23
  • @PeterCordes, got it somewhat. Also, here x200(p = *(void **)p;) , what we trying to access is the value at address pointed by array members, and every time (total 200 times) we get new value ? – Milan Aug 29 '22 at 19:35
  • No, `p` has no interaction with the atomic increment of the count. It's just dereferencing `p` to get another `void*`, following the linked list. aka "chasing" the chain of pointers. – Peter Cordes Aug 29 '22 at 19:41

1 Answers1

1

I'd assume x200 is a CPP macro that repeats its operand 200 times, for loop unrolling. Read x200 as "times 200". Possible implementation:

#define x5(x)  x x x x x
...
#define x200(x)  x100(x) x100(x)

That would make sense to amortize the cost of the atomic RMW to increment t->x.count.

You can check by looking at C preprocessor output, e.g. gcc -E, or search for the macro definition in the included files.


The actual statement being repeated, p = *(void **)p;, is just dereferencing a pointer to get another pointer. Often you'd have p = p->next; if you had struct foo *p, but with just a void* you need a cast to keep the compiler happy and get an assembly instruction like mov rax, [rax].

Without a struct that can contain a memory that's a pointer to the same struct type, dereferencing a pointer is going to change the type. C doesn't have a way to declare an infinitely-indirect pointer that just produces the same type when you deref it.


where and how in the code, access pattern is made somewhat unpredictable so that H/W prefetcher doesn't guess it properly?

Nowhere in this function; it's just iterating through an existing linked list.

Look for the code that allocates space and stores pointers into it. Probably allocated as one large array, so it has control over the layout, unlike if it did many small mallocs. A random shuffling of the integers 0..n-1 could be mapped transformed to an array of pointers to pointers. Or something like that; perhaps you need to invert that mapping to make sure there aren't short cycles. Or maybe there's some simpler way to generate a linked list that covers all slots in a random order, with the last element pointing to the first so it's a closed loop that can be iterated as many times as you want.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Could be that for this function here, https://github.com/google/multichase/blob/master/multichase.c#L171. access patter is made somewhat unpredictable. – Milan Aug 29 '22 at 19:39
  • @Milan: No. That's incrementing a value in each linked-list element as it walks through the list. But new `p` values are just loaded from memory, so again, it's exactly the same problem: the linked list elements have to be laid out in a random order in memory. That should be clear from the C code itself. If you're not familiar with C basics, I'd suggest looking at some linked-list examples, because that's what pointer-chasing is. – Peter Cordes Aug 29 '22 at 19:46
  • @Milan: That comment is at the top of the file. It's about the code that creates the linked list, not the code that walks through it! i.e. what factors should be considered when allocating the elements and randomizing what order they point to each other in. Searching for "random" in the file finds some code in `main` that sets members of a `genchase_args` object, so clearly the code using that is where the random layout is generated. – Peter Cordes Aug 29 '22 at 19:57
  • @Milan: Actually, that comment is right above some `#define` constants that are used for that, like total size, stride, and so on. Oh, does it actually stride by a constant amount, not a random shuffle? A large enough stride (like a whole page) can defeat HW prefetch, but then you'd have lots of TLB misses, too. Unless it loops back to use other elements of the same pages. – Peter Cordes Aug 29 '22 at 20:00
  • Peter Cordes. may be I should print the address of the link list to see if the address that gets printed is somewhat random, or updating by a constant amount. For larger page size could there more latency as TLB misses are more, right ? – Milan Aug 29 '22 at 20:05
  • @Milan: Sure, printing `p` inside the loop would be a simple way to see what the generating code ended up doing. Having more total pages would mean TLB misses were more common. But having a larger page *size* would reduce TLB misses for the same array size in bytes, as the same number of TLB entries would cover more memory. (Some CPUs would use a different set of TLB entries for hugepages than for normal-size pages. x86-64 page sizes are fixed at 4k normal, 2M largepage. But AArch64 can be configured to use different page sizes, like up to 16k normal-page size.) – Peter Cordes Aug 29 '22 at 20:15