2

There are various reasons I might want to spike the CPU temperature as much as possible:

  • Testing cooling setups.
  • Increasing the fan speed without messing with the BIOS settings.
  • Testing the effect of temperature throttling on other processes running on the same die.

I know that modern CPUs have numerous functional units and various tricks to try to keep as many of them occupied as possible. So, I know this answer might have some variants for specific CPU micro-architectures.

But, what code can I run that will do the most to spike the CPU temperature?

Inline assembly is fine. But, it would be nice to have an answer in C or C++.

I did write this function that I hope will keep several different functional units fairly occupied with pointless calculation:

void busy_busy_busy()
{
   using uint_t = ::std::uint64_t;
   const auto maxuint = ::std::numeric_limits<uint_t>::max();
   ::std::random_device random_device;
   ::std::mt19937_64 generator(random_device());
   ::std::uniform_int_distribution<uint_t> ints(0, maxuint >> 1);
   ::std::uniform_real_distribution<long double> doubles(0, 1);
   long double starting = doubles(generator);
   long double const one_third = 1.0l / 3.0l;
   while (starting != one_third) {
      auto tmp = starting;
      tmp *= doubles(generator);
      starting *= ints(generator);
      starting += tmp;
      starting /= maxuint >> 2;
   }
}

Off-the-shelf benchmarks are typically not designed to do this. They're designed to test speed of execution of either loads that are purposefully synthetic, or loads that are designed to mimic real-world usage. They will increase the CPU temperature, but I'm after code that's very specifically designed to increase the CPU temperature with a synthetic load that would likely never occur in these benchmarks.

I, personally, have an AMD Ryzen 9 3950X.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • You forgot to include your attempt at solving this problem. – Scott Hunter Dec 27 '22 at 15:17
  • Make all cores very busy? Or run a benchmark app? – Chris O Dec 27 '22 at 15:18
  • Run your code in several threads (N = number of of cores). – HolyBlackCat Dec 27 '22 at 15:22
  • 2
    1) Use low-level libs to access the CPU fan to slow it down dramatically. 2) Buy a new CPU. – Déjà vu Dec 27 '22 at 15:24
  • @HolyBlackCat - I want to do this to only a single core. Then, if I want to spike the whole die, I can just run the same code on all cores. – Omnifarious Dec 27 '22 at 15:26
  • 2
    The AVX instructions use a lot of power. – drescherjm Dec 27 '22 at 15:27
  • Find a some CPU-intensive benchmark software. – NovaDenizen Dec 27 '22 at 15:28
  • 5
    To get the temperature as high as possible, you need to investigate the specific CPU model’s documentation thoroughly. You likely need to use whatever its widest SIMD type is, whether that is 256 bits or 512 bits. You should find out how many arithmetic units of each type it has, along with their latencies, and write a loop with one instruction for each arithmetic unit, using independent registers. You should also look into the load/store units and see if you can keep them busy without causing waits for memory response. – Eric Postpischil Dec 27 '22 at 15:33
  • To do literally what you're asking would be difficult because multi-core CPUs move a heavy load around to avoid thermal throttling. I think it might be the OS that does this so there might be some low level 'kernel' programming that could keep a heavy job running on the same core. Tough to do though... – Simon Goater Dec 27 '22 at 15:34
  • @SimonGoater - On Linux it's possible to use cgroups to limit which cores a process can run on. And I think there are other ways to accomplish this as well. And so this is very doable without that much effort. – Omnifarious Dec 27 '22 at 15:36
  • 1
    You should also write in assembly language, so you can ensure the instructions you want are issued without the compiler either optimizing them away because they do nothing useful for having to tie them to some inputs and outputs so the compiler does not optimize them away, because those ties cause dependencies that may cause the processor to wait to issue instructions. In particular, in `starting *= ints(generator); starting += tmp; starting /= maxunit;`, none of those operations can start until the previous one completes, since they each need `starting`. – Eric Postpischil Dec 27 '22 at 15:36
  • 1
    What if provide a lot of temperature externally - e.g. hair dryer? –  Dec 27 '22 at 15:36
  • 1
    I'm highly amused that someone submitted a ChatGPT generated answer, and how easy it was to spot. – Omnifarious Dec 27 '22 at 15:49
  • 1
    So, you want a bunch of independent vector operations of the maximum width your hardware supports that doesn't waste time storing results in memory. Maybe throw in some scalar operations and some x87 instructions to see if the processor can do them at the same time, and somehow containerise it so it runs only on one core. For cores with hyper-threading, you may also need to run two or more threads on it to max it out? You could increase the CPU voltage as well for extra warmth in these cold days... – Simon Goater Dec 27 '22 at 16:03
  • 1
    [Related](https://stackoverflow.com/q/12715461/509868) – anatolyg Dec 27 '22 at 16:04
  • Some of these comments are very close to being answers. Not the ones that suggest non-programmatic means, or the one that suggests I should purposefully destroy my CPU with too much heat. – Omnifarious Dec 27 '22 at 19:30
  • You can use the [taskset](https://www.man7.org/linux/man-pages/man1/taskset.1.html) utility (included in util-linux package, available in all Linux distributions) to confine your process to specific core or cores. It uses the Linux kernel [cpuset](https://www.man7.org/linux/man-pages/man7/cpuset.7.html) facility to do what it does. That leaves the core stressing part. You can either write your own – complicated for x86-64, having superscalar ALUs and SIMD vector engines –, or use [cpuburn](http://www.cpuburnin.com/). – Blabbo the Verbose Dec 28 '22 at 01:01
  • 3
    Related: [How do I achieve the theoretical maximum of 4 FLOPs per cycle?](https://stackoverflow.com/q/8389648) for SnB, or similar code for later CPUs using FMA. Or Prime95 stress tests. Pin it to a specific core if you want. – Peter Cordes Dec 28 '22 at 02:59
  • @BlabbotheVerbose - cpuburn appears to be proprietary, and to have been written a very long time ago and not updated since. Disassembling it, the main loop uses no vector instructions. Otherwise, your comment would be a candidate for an answer. Thank you. – Omnifarious Dec 28 '22 at 14:41

1 Answers1

0

Here is a not-so-good answer, and I will likely not be accepting it. But it is an update to my original program that I think works better for the task based on some of the helpful almost-an-answer comments:

void busy_busy_busy()
{
   using uint_t = ::std::uintmax_t;
   const auto maxuint = ::std::numeric_limits<uint_t>::max();
   constexpr auto vecsize = 128 / sizeof(double);
   using vec_t = ::std::array<double, vecsize>;
   constexpr double double_iota = ::std::numeric_limits<double>::epsilon();

   ::std::random_device random_device;
   ::std::mt19937_64 generator(random_device());
   ::std::uniform_int_distribution<uint_t> ints(0, maxuint >> 1);
   ::std::uniform_real_distribution<double> doubles(1, 1 + (1 - double_iota));
   auto random_vec = [&generator, &doubles]() {
      vec_t randvec;
      for (auto &i: randvec) {
         i = doubles(generator);
      }
      return randvec;
   };
   auto dot_product = [](vec_t const &a, vec_t const &b) {
      vec_t::value_type sum = 0;
      for (int i = 0; i < a.size(); ++i) {
         sum += a[i] * b[i];
      }
      return sum;
   };
   vec_t starting = random_vec();
   vec_t::value_type last_dot = 0;
   vec_t::value_type const one_third = 1.0l / 3.0l;
   while (last_dot != one_third) {
      auto tmp = dot_product(starting, random_vec());
      for (auto &i: starting) {
         i *= tmp;
      }
      last_dot = tmp;
   }
}

A recent gcc will vectorize this appropriately: https://compiler-explorer.com/z/96se4MxPa

Omnifarious
  • 54,333
  • 19
  • 131
  • 194