It may be viable to disable turbo or whatever other vendors call their boost clocks, so the clock frequency stays at the baseline frequency the laptop can sustain.
Running the CPU a lot slower (like 1.6GHz instead of 3 or something) changes the relative cost of a cache miss to DRAM vs. a branch mispredict, because DRAM still takes a similar amount of nanoseconds but that's a lot fewer clock cycles. So it's not perfect if the thing you're comparing involves that kind of tradeoff. Similarly for I/O vs. CPU.
If you can get your system to run at a couple different low but stable frequencies, you can extrapolate performance at higher frequencies even for workloads that are sensitive to memory latency and maybe bandwidth: Dr. Bandwidth explains how in a blog article with slides from his HPC conference talk on it.
For mostly CPU-bound stuff (not memory or I/O), perf stat ./my_program
can be useful: look at time in core clock cycles instead of seconds. This doesn't even try to control for relative differences in cache miss costs vs. on-core effects, but is convenient if you're on Linux or another OS that has a handy profiler that can use HW performance counters. (Usually only works on bare metal, not in a VM; most VMs don't virtualize the performance-monitoring unit.)
If L3 cache misses are a significant part of the performance cost, you'd expect core clock cycles to vary with frequency, again because of RAM becoming relatively faster / lower latency compared to the CPU core, meaning out-of-order exec can hide more of the latency of a cache miss.
See also Idiomatic way of performance evaluation? for other benchmark considerations not related to keeping frequency stable.
See Why can't my ultraportable laptop CPU maintain peak performance in HPC for a good example of an ultraportable laptop's CPU frequency vs. time when running power-intensive loads, and the CPU-design reasons for it being that way.