0

How can I tell the compiler not to optimize and not to add any other instructions in between, and force the CPU to execute them back to back?

For example I'd like the kernel module to execute write (or read or mixed) commands as fast as possible

writel(0, addr);
writel(1, addr);
writel(0, addr);

or

writel(0, addr1);
writel(1, addr2);
writel(0, addr3);

Edit:

  • I replaced iowrite32 with writel that has a definition with volatile void __iomem *addr

  • addr* can be allocated with dma_alloc_coherent() or simply ioremap().

  • My question isn't about order of execution (that is solved with memory barrier or volatile) but delay between them.

  • Might be possible by combining my commands in a single assembly asm volatile() but I'd rather use something safer.

Alexis
  • 2,136
  • 2
  • 19
  • 47
  • If `addr` is volatile, compiler won't reorder the instructions, but then `iowrite32` should be defined as `iowrite32(int, volatile something *)`. If it's not volatile, you will likely have to use compiler specific pragmas (e.g. `asm volatile("": : :"memory")` in gcc), but if it's not volatile, the compiler might also optimize away everything except the last write, especially is `iowrite32` is a macro/inline. I also don't believe there is a CPU which will reorder multiple writes to the same location. – vgru Mar 26 '20 at 11:14
  • I edited my question to make it more generic. Thanks. If a simple `volatile` is enough that's perfect. Let me check. – Alexis Mar 26 '20 at 11:23
  • I think CPU will always execute instructions in order they were given (or at least as-if). The only thing that can reorder instructions is your compiler. – KamilCuk Mar 26 '20 at 11:34
  • That's why I'd like to force the compiler not to reorder instructions. Telling him not to optimize away with `volatile` doesn't guarantee the instructions will be executed in order and back to back. – Alexis Mar 26 '20 at 11:44
  • 1
    @KamilCuk: architectures like IA64 or ARMv7 reorder pretty much everything, and IA32 also does certain types of reorderings (e.g. [this example](https://stackoverflow.com/q/6623628/69809)). – vgru Mar 26 '20 at 11:45
  • 1
    @Alexis_FR_JP: now that you've edited the question, it seems like your concern is that the timing of these writes must be exact in terms of CPU cycles? If that's the case, your best bet would be to use inline asm + making sure that no other threads/interrupts are enabled. – vgru Mar 26 '20 at 12:01
  • If you want things to run "as fast as possible", why would you think limiting the capabilities of the compiler would **HELP**? Do you know as much about the architecture you're compiling for as the entire group that wrote the compiler you're using? – Andrew Henle Mar 26 '20 at 12:01
  • @Groo yes I hoped I could avoid `inline asm` but I'll end up doing that. Thanks for your help and the reminder about the reorderings! – Alexis Mar 26 '20 at 12:07
  • @AndrewHenle Your thinking is too narrow. The compiler has no idea I'd like those instructions literally back to back for a hardware/embedded system use case. Don't you wonder why we need volatile, mem barrier and asm if the compiler is so smart. – Alexis Mar 26 '20 at 12:16
  • You're concerned about running "as fast as possible", and you're posting about memory barriers?!?! – Andrew Henle Mar 26 '20 at 12:20
  • not necessary "as fast as possible", but "in order" and "together/backtoback" – Alexis Mar 26 '20 at 12:32
  • 1
    In Linux kernel we have two kind of I/O accessors: a) strict,like `writel()`, `iowrite32be()` and so on, and b) relaxed, like `writel_relaxed()`. So, if you look deeper the a) group is guaranteed not to be reordered (but you have to keep in mind bus specifics, behind which a device sits), while b) group allow compiler to reorder if it sees it's better for code generation. Read this: https://www.kernel.org/doc/html/latest/driver-api/device-io.html – 0andriy Mar 26 '20 at 15:14
  • @0andriy Thanks, I'm not an expert and everything is good to read! I also understand my question might not be very accurate because I lack of knowledge in that field and what actually does matter. I got all the answers and will work with that. Thanks everybody. – Alexis Mar 26 '20 at 22:02

1 Answers1

0

Generally, the only reliable way to make sure that a particular sequence of instructions executes consecutively is to write all of them in a single asm volatile statement.

The gcc manual says this explicitly (6.47.2.2):

Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.

I'm not sure what architecture you have in mind, but for your second example in particular, the compiler might need to do some work before each writel to get the appropriate address into the appropriate register. To meet your requirements, you'd want it to do all that work up front, and I don't know of any way to force it to do so.

Telling the compiler "not to optimize" will usually accomplish the opposite of what you want. For instance, with your first example, without optimizations the compiler will probably not realize that it can keep addr in the same register throughout, and will generate code to reload it each time.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • It's not the precise answer to the question. See, there OP asked about I/O accessors. It's not the same as memory accessors. They have different effect on the I/O where DMA engine is involved. It's not about general C stuff. Also don't forget bus specifics (like write combine feature). – 0andriy Mar 26 '20 at 15:09
  • Well, I think it answers the specific question about how to "execute instructions back to back". Whether the OP's underlying problem (which they didn't explain) is actually solved by executing instructions back to back is a separate question, and you're welcome to address that if you understand what the underlying problem is. – Nate Eldredge Mar 26 '20 at 20:28
  • When answering the question I guess the good response is to cover it as a whole. – 0andriy Mar 27 '20 at 07:47