I'm not aware of any standalone tool that does it, but for Intel chips only, if you know the "ARK URL" for the chip, you could get the maximum bandwidth using a combination of a tool to query ARK, like curl
, and something to parse the returned HTML, like xmllint --html --xpath
.
For example, for my i7-6700HQ, the following works:
curl -s 'https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3_50-GHz' | \
xmllint --html --xpath '//li[@class="MaxMemoryBandwidth"]/span[@class="value"]/span/text()' - 2>/dev/null
This returns 34.1 GB/s
which is the maximum theoretical bandwidth of my chip.
The primary difficulty is determining the ARK URL, which doesn't correspond in an obvious way to the CPU brand string. One solution would be to find the CPU model on an index page like this one, and follow the link.
This gives you the maximum theoretical bandwidth, which can be calculated as (number of memory channels) x (trasfer width) x (data rate)
. The data rate
is the number of transfers per unit time, and is usually the figure given in the name of the memory type, e.g., DDR-2133
has a data rate of 2133 million transfers per second. Alternately you can calculate it as the product of the bus speed (1067 MHz in this case) and the data rate multiplier (2 for DDR technologies).
For my CPU, this calculation gives 2 memory channels * 8 bytes/transfer * 2133 million transfers/second = 34.128 GB/s
, consistent with the ARK figure.
Note that theoretical maximum as reported by ARK might be lower or higher than the theoretical maximum on your particular system for various reasons, including:
- Fewer memory channels populated than the maximum number of channels. For example, if I only populated one channel on my dual channel system, theoretical bandwidth would be cut in half.
- Not using the maximum speed supported RAM. My CPU supports several RAM types (DDR4-2133, LPDDR3-1866, DDR3L-1600) with varying speeds. The ARK figure assumes you use the fastest possible supported RAM, which is true in my case, but may not be true on other systems.
- Over or under-clocking of the memory bus, relative to the nominal speed.
Once you get the correct theoretical figure, you won't actually reach this figure in practice, due to various factors including the following:
- Inability to saturate the memory interface from one or more cores due to limited concurrency for outstanding requests, as described in the section "Latency Bound Platforms" in this answer.
- Hidden doubling of bandwidth implied by writes that need to read the line before writing it.
- Various low-level factors relating the DRAM interface that prevents 100% utilization such as the cost to open pages, the read/write turnaround time, refresh cycles, and so on.
Still, using enough cores and non-termporal stores, you can often get very close to the theoretical bandwidth, often 90% or more.