1

I want to write a shell script/command which uses commonly-available binaries, the /sys fileystem or other facilities to calculate the theoretical maximum bandwidth for the RAM available on a given machine.

Notes:

  • I don't care about latency, just bandwidth.
  • I'm not interested in the effects of caching (e.g. the CPU's last-level cache), but in the bandwidth of reading from RAM proper.
  • If it helps, you may assume a "vanilla" Intel platform, and that all memory DIMMs are identical; but I would rather you not make this assumption.
  • If it helps, you may rely on root privileges (e.g. using sudo)
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • which bandwidth are you interested in? CPU <--> RAM? I/O <--> RAM? and by RAM do we mean Virtual Memory or direct access to physical memory? What about L3 (or last) cache? Did you have a look at https://superuser.com/questions/827207/linux-ram-throughput-statistic-for-a-given-pid ? – diginoise Jul 20 '18 at 14:13
  • @diginoise: I asked about the RAM, not the CPU cache. I meant how much you can read from RAM to everywhere on the system; typically this would be how much you can read from the different memory banks to the various CPU sockets on the system. – einpoklum Jul 20 '18 at 14:24
  • Are you wanting to *benchmark*, like with `time dd if=/dev/zero of=/dev/null bs=1g count=200` or something? If not, the `[benchmarking]` tag doesn't make sense. – Peter Cordes Oct 27 '18 at 01:58
  • You say you want the "theoretical" max bandwidth, which means not a benchmark, but rather reading the DRAM parameters and bus speed and simply multiplying out the resultant bandwidth (probably looking up the number of memory channels based on the CPU model). If you do want a benchmark, [STREAM](https://www.cs.virginia.edu/stream/) is one de-facto standard. Various benchmark packages offer their own memory bandwidth tests. TinyMemBench is another. – BeeOnRope Oct 28 '18 at 00:33
  • @BeeOnRope: I see what you mean. I'm dropping the `[benchmarking]` tag. – einpoklum Oct 28 '18 at 07:46
  • @einpoklum - so to be clear then, you are looking to _calculate_ this theoretical value, based on hardware characteristics such as the RAM frequency and number of memory channels, rather than measure it? – BeeOnRope Nov 03 '18 at 01:10
  • @BeeOnRope: Yes, sorry for the unclarity. – einpoklum Nov 03 '18 at 08:47

2 Answers2

1

@einpoklum you should have a look at Performance Counter Monitor available at https://github.com/opcm/pcm. It will give you the measurements that you need. I do not know if it supports kernel 2.6.32

Alternatively you should also check Intel's EMON tool which promises support for kernels as far back as 2.6.32. The user guide is listed at https://software.intel.com/en-us/download/emon-user-guide, which implies that it is available for download somewhere on Intel's software forums.

  • While I appreciate the link, I was after an answer that uses binaries already available on most systems, not something I need to download and build (which in some case I don't have the ability to di). – einpoklum Oct 26 '18 at 17:40
1

I'm not aware of any standalone tool that does it, but for Intel chips only, if you know the "ARK URL" for the chip, you could get the maximum bandwidth using a combination of a tool to query ARK, like curl, and something to parse the returned HTML, like xmllint --html --xpath.

For example, for my i7-6700HQ, the following works:

curl -s 'https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3_50-GHz' | \
xmllint --html --xpath '//li[@class="MaxMemoryBandwidth"]/span[@class="value"]/span/text()' - 2>/dev/null

This returns 34.1 GB/s which is the maximum theoretical bandwidth of my chip.

The primary difficulty is determining the ARK URL, which doesn't correspond in an obvious way to the CPU brand string. One solution would be to find the CPU model on an index page like this one, and follow the link.

This gives you the maximum theoretical bandwidth, which can be calculated as (number of memory channels) x (trasfer width) x (data rate). The data rate is the number of transfers per unit time, and is usually the figure given in the name of the memory type, e.g., DDR-2133 has a data rate of 2133 million transfers per second. Alternately you can calculate it as the product of the bus speed (1067 MHz in this case) and the data rate multiplier (2 for DDR technologies).

For my CPU, this calculation gives 2 memory channels * 8 bytes/transfer * 2133 million transfers/second = 34.128 GB/s, consistent with the ARK figure.

Note that theoretical maximum as reported by ARK might be lower or higher than the theoretical maximum on your particular system for various reasons, including:

  • Fewer memory channels populated than the maximum number of channels. For example, if I only populated one channel on my dual channel system, theoretical bandwidth would be cut in half.
  • Not using the maximum speed supported RAM. My CPU supports several RAM types (DDR4-2133, LPDDR3-1866, DDR3L-1600) with varying speeds. The ARK figure assumes you use the fastest possible supported RAM, which is true in my case, but may not be true on other systems.
  • Over or under-clocking of the memory bus, relative to the nominal speed.

Once you get the correct theoretical figure, you won't actually reach this figure in practice, due to various factors including the following:

  • Inability to saturate the memory interface from one or more cores due to limited concurrency for outstanding requests, as described in the section "Latency Bound Platforms" in this answer.
  • Hidden doubling of bandwidth implied by writes that need to read the line before writing it.
  • Various low-level factors relating the DRAM interface that prevents 100% utilization such as the cost to open pages, the read/write turnaround time, refresh cycles, and so on.

Still, using enough cores and non-termporal stores, you can often get very close to the theoretical bandwidth, often 90% or more.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • How do I determine the correct URL for a different Intel CPU? – einpoklum Jul 05 '22 at 10:50
  • @einpoklum - I'm not aware of any simple way. The per-CPU page names follow _some_ structure, but it does vary from family to family (e.g., some mention the cache size, etc). If you really wanted to do this, you'd probably want to scrape all the product URLs (e.g., from the index pages), then do a fuzzy search e.g., for the model number, rather than trying to generate the URL directly. With a few rules this might produce something approaching a reliable result. – BeeOnRope Jul 05 '22 at 19:31