3

Looking for a C/C++ program to test how long it takes to access a fixed piece of memory, specifically in RAM.

How do I ensure testing access time is not of cache or TLB data?

For example, can I "disable" all cache/TLB?

Or can I specify a specific address in RAM to write/read only?

On the other hand, how would I ensure I am only testing cache?

Are there ways to tell the compiler where to save and read from, cache/ram?

For example, is there a well know standard program (in one of these books?) that is know for this test?

I did see this but I do not understand how adjusting the size of the list, you can control whether the memory accesses hit L1 cache, L2 cache, or main memory: measuring latencies of memory

How can one correctly program this test?

Community
  • 1
  • 1
P.S.
  • 384
  • 3
  • 18
  • 1
    You can request cache memory by using the keyword `register`... not sure about using slower memory... – Fiddling Bits Nov 25 '13 at 01:12
  • @bit-fiddling-code-monkey: please show an example, I can add that to the question... – P.S. Nov 25 '13 at 01:13
  • 2
    `register` is a compiler hint to use a CPU register, which is completely different from cache memory.... – Tony Delroy Nov 25 '13 at 01:16
  • 1
    @TonyD You are absolutely correct... – Fiddling Bits Nov 25 '13 at 01:18
  • @tony-d: is there a compiler flag for cache only? – P.S. Nov 25 '13 at 01:21
  • 2
    @P.S. no Standard flags for things like that, no. Few CPUs support software control of the caches, and when they do it's likely to be a privileged operation (i.e. need root/admin). Some CPUs do have instructions for controlling pre-fetch, and for accessing memory content without populating the cache - e.g. MOVNTI (see http://www.rz.uni-karlsruhe.de/rz/docs/VTune/reference/vc195.htm) - you might be able to make use of that in your benchmarking. – Tony Delroy Nov 25 '13 at 01:42
  • @TonyD: does what Dietrich Epp said in the linked question make sense? – P.S. Nov 25 '13 at 02:01
  • 1
    @P.S. Most of it makes sense to me, though I'm not sure why linked pointers are particularly useful for the measurement... though they are one way of pre-deciding the traversal order in a way that invalidates the cache, then allowing fast page-to-page jumps. – Tony Delroy Nov 25 '13 at 04:02

1 Answers1

2

Basically, as the list grows you'll see the performance worsen in steps as another layer of caching is overwhelmed. The idea is simple... if the cache holds the last N units of memory you've accessed, then looping around a buffer of even N+1 units should ensure constant cache misses. (There're more details/caveats in the "measuring latencies of memory" answer you link to in your question).

You should be able to get some idea of the potential size of the the largest cache that might front your RAM from hardware documentation - as long as you operate on more memory than that you should be measuring physical RAM times.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • http://ark.intel.com/products/56056/ - can assume that if I am writing more than 2MB to "memory" and then reading it back, it is missing the cache or it has to be in a loop? – P.S. Nov 25 '13 at 01:17
  • 1
    There's the basic idea... subject to pre-fetch issues. A loop's potentially useful to give you enough samples to avoid the need for high-resolution timing (there are serious issues with reading the real time clock register unless you pin the thread to a core and don't care about possible CPU clockrate changes due to power saving modes etc. during the test, and on Linux and Windows - often with QueryPerformanceCounter et al too). You might want to read over http://www.sisoftware.net/?d=qa&f=ben_mem_latency - good background on this stuff. – Tony Delroy Nov 25 '13 at 01:33