How to write x86 assembly code to check the effect of temperature on the performance of the processor

Question

I have to write an x86 assembly code that should run on Intel x86 processor.

Actually have to write like addition or move instructions to see the effect of these instructions of the performance of processor w.r.t temperature. That means my code should be capable of controlled heat generation from processor.

If you people have such a code or any one having experience to write such type of code please share.

Edited your title so that people won't be misled by "malicious". If you need "controlled heat generation" use some feedback method. You might not need asm at all. — Jester, Dec 29 '16 at 13:58
Wait a minute "check the effect of temperature on performance?" Why are you writing your own code for this? Why don't you just use prime95 to trigger thermal throttling like a normal person? I didn't notice the "check the effect on performance" part of the title while answering, since you didn't even mention that in the question body. — Peter Cordes, Dec 29 '16 at 14:06
Thanks @PeterCordes. For my project I have to do this. And because I want to measure the smallest amount of Temperature increase, thats why I want to use a code. — qah, Dec 29 '16 at 15:13
You should have said that in the question. Updated the answer with some ideas on making minimal heat above idle. — Peter Cordes, Dec 29 '16 at 23:51

score 5 · Accepted Answer · edited May 23 '17 at 12:02

For maximum heat, you want as many transistors as possible changing state every clock cycle. The floating point FMA units have a lot of transistors; keeping them busy makes a lot of heat, especially for 256b AVX vectors.

e.g. see the "stress testing" section of this Skylake overclocking guide, where you can see that Prime95 version 28 and Linpack are the hottest-running workloads. There's also a table of whole-system power consumption.

See also http://agner.org/optimize/ to learn more about CPU internals, especial Agner's microarch guide. You should be able to make less or more heat by having a loop that fits in the loopback buffer or not. The x86 decoders are much more power-intensive than reusing already-decoded uops. See this Q&A about uop throughput for various loop sizes, for the case where there aren't significant dependencies between the instructions so only the frontend limits throughput. (See also the x86 tag wiki).

I doubt you'll see very much different in heat from integer add reg, reg vs. mov reg, reg or something. Maybe saturating the throughput of the integer mul unit would make a measurable heat / power difference, but the different cost of an adder vs. a mov or a simple boolean op is probably dwarfed by the power cost of out-of-order execution tracking the add through the pipeline.

Loads or stores that keep the cache and store-buffer hardware active might be a different story, but add can have a memory source or dest too. Just make sure you don't bottleneck your loop on the store-forwarding latency of a single memory-destination add.

For minimum heat without actually sleeping, use the pause instruction in a loop. On Skylake, it sleeps much longer (~100 cycles) than on previous Intel microarchitectures (~5 cycles), IIRC.

According to powertop on Linux, the kernel uses mwait with different hints to enter different levels of sleep on Intel CPUs (e.g. my Skylake desktop). You might be able to do this from user-space if you want, or use nanosleep to alternate sleep/wake and run a heat-producing workload with a certain duty cycle.

Sleeping frequently may prevent the OS from ramping the CPU up to full clock speed, depending on your setup. Why does this delay-loop start to run faster after several iterations with no sleep?

For other ideas on reducing throughput in a loop, see Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs. Stalls that are just slow without flipping a lot of transistors to recover might be a good way to make a loop that doesn't make much heat.

Without pause, you'll see significant heating from just a simple infinite loop like .repeat: jmp .repeat, especially on a CPU that can "turbo" up to a high voltage/frequency for as long as thermal limits allow.

I think using floating point FMA will increase temperature drastically. With the integers I want to measure as low temperature change as I can. Can I do this by having a loop? If there is any way? Do you have any example code? Thanks a lot. — qah, Dec 29 '16 at 15:21
A tight loop with `add`, `mov`, whatever would be sufficient to keep the processor from sleeping, thus generating heat. Not an excess amount, certainly, but it sounds like this is the type of small impact that the OP is looking for. The trick will contending with the OS's scheduler if you're trying to precisely "control" the amount of heat generated. — Cody Gray - on strike, Dec 29 '16 at 18:01
@ Peter Cordes @CodyGray I have run stress code from Prime95 that increases the temperature up to 20 Degrees gradually. In my case it goes from 40 to 80 as well. Now I want to insert some , lets say 4 different Lines of Code that will cause little deviation in Temperature as compared to original code, 20 times to get the 20 different temperature Deviations. How and Where Can I insert these lines 20 time for 20 different deviations in Temperature? Thanks — qah, Jan 30 '17 at 16:59

How to write x86 assembly code to check the effect of temperature on the performance of the processor

1 Answers1

Linked