9

I would like to swap two variables. and i would like to do it through the pipeline using a Read After Write hazard to my advantage.

Pipeline:

OPERXXXXXX FetchXXXXX DecodeXXXX ExecuteXXX WriteBkXXX
STORE X, Y ---------- ---------- ---------- ----------
STORE Y, X STORE X, Y ---------- ---------- ----------
---------- STORE Y, X STORE X, Y ---------- ----------
---------- ---------- STORE Y, X STORE X, Y ----------
---------- ---------- ---------- STORE Y, X STORE X, Y
---------- ---------- ---------- ---------- STORE Y, X

how do i go about telling the compiler to do that (and exactly that) without automatic locks and warning flags? can you suggest any literature/keywords?

specs:

  • -> target: modern architectures which support multistation (more than 4) pipelining

  • -> this is not related to any particular 'problem'. just for the sake of science.

current hurdles:

  • if you know how to ignore datahazards, please share.
Community
  • 1
  • 1
CLASSIFIED
  • 107
  • 5
  • _Why is there no tag for hazards?_ Because nobody's made one yet... – Lightness Races in Orbit Dec 08 '11 at 13:00
  • 6
    Pipeline hazards are dealt with by the CPU, not the compiler... – Oliver Charlesworth Dec 08 '11 at 13:06
  • 3
    The C++ abstract machine does not have an instruction pipeline. What is the actual problem that you are trying to solve? – Mankarse Dec 08 '11 at 13:32
  • @OliCharlesworth, MIPS meant "Microprocessor without Interlocked Pipeline Stages" in order to indicate that hazards had to be handled by the programmer or the compiler, but that approach exposes too much of the micro-architecture to be viable. Different models have different pipelines, and without going to the 40+ stages of P4, most have more than 10 stages. – AProgrammer Dec 08 '11 at 13:37
  • @OliCharlesworth - then how do you talk to the cpu?/tell the cpu to not be stupid? – CLASSIFIED Dec 08 '11 at 13:43
  • @Mankarse not any particular problem - I just want to try to implement this particular approach for the sake of science. – CLASSIFIED Dec 08 '11 at 13:45
  • 2
    @CLASSIFIED - In that case you would be better off writing the whole thing in the assembly language for the architecture that you are interested in. C++ provides almost no access to the underlying hardware (although compilers often extend the language to give you slightly more control). – Mankarse Dec 08 '11 at 13:49
  • @Mankarse: Even if you write this in assembler, the CPU will spot the RAW hazard. – Oliver Charlesworth Dec 08 '11 at 13:58
  • @OliCharlesworth: That depends on the CPU (which is not specified in the question). – Mankarse Dec 08 '11 at 13:59
  • 4
    +1 for an interesting question for which I have no idea of the answer. – John Dibling Dec 08 '11 at 14:01
  • 1
    @Mankarse: Fair point. Ok, I'll rephrase as "most modern desktop/server CPUs will spot the RAW hazard". – Oliver Charlesworth Dec 08 '11 at 14:01
  • @Mankarse that's the answer I was afraid of - most non primitive architectures have the capability to do just that if I understand this correctly - however the portability goes to shreds if I start writing hardware specific code. it'll suffice for a proof of concept, but it's not as elegant as i'd hoped. thanks! – CLASSIFIED Dec 08 '11 at 14:02
  • @Mankarse ah well yeah about the hazard spotting that's the entire point – CLASSIFIED Dec 08 '11 at 14:03
  • 6
    @CLASSIFIED: If you're after portability, then I'd suggest writing `T tmp = x; x = y; y = tmp;` (or simply `std::swap(x,y)`). I would hope that the compiler would always do the most optimal thing given the limitations of the architecture. – Oliver Charlesworth Dec 08 '11 at 14:06
  • @OliCharlesworth yeah no. that absolutely defeats the point. you'll still have at least 33% more instructions and dependency gaps which need to be filled. – CLASSIFIED Dec 08 '11 at 14:10
  • @CLASSIFIED: My point is, if it's possible to achieve this trick on a given CPU, then the compiler author probably already knows about it. – Oliver Charlesworth Dec 08 '11 at 14:19
  • @OliCharlesworth IC. but then the compiler would have to recognize `T tmp = x; x = y; y = tmp;` as a swap, and it's written in the standard that std::swap is to be executed linearly, which takes approx 280% longer to do on an individual basis. so there's not really a way to implement this into the existing language, thus I don't expect it to be there even if the authors did know about it. I absolutely see what you're saying though. – CLASSIFIED Dec 08 '11 at 14:33
  • @CLASSIFIED: Where in the standard does it say that `std::swap` is to be executed linearly, and why doesn't the "as-if" rule apply? As far as I can tell, after `std::swap(a, b);`, `a` and `b` have exchanged values, and if there are no side effects in evaluating `a` and `b` the compiler is free to do as it likes behind the scenes. – David Thornley Dec 08 '11 at 15:53
  • @DavidThornley http://www.cplusplus.com/reference/algorithm/swap/ my particular compiler does not support it. i mean if it did, i wouldn't be asking, would i? – CLASSIFIED Dec 08 '11 at 16:36
  • @CLASSIFIED: The cplusplus.com reference seems to be meant to apply to larger data structures, where copy constructors and assignment operators can take significant time, and where the compiler is unlikely to come up with the appropriate shortcut. It doesn't mean it has to work that way. The Standard specifies that the values will be swapped, and doesn't say how. Exactly how `std::swap` works is completely up to the implementation, so the question isn't about C++ so much as an implementation of C++. – David Thornley Dec 08 '11 at 17:14
  • @DavidThornley okay, if you say so. that doesn't really help though. – CLASSIFIED Dec 08 '11 at 17:21
  • Ok folks, you all have enough rep so it's time to move this to chat. Thanks. – Kev Dec 08 '11 at 18:55
  • @Kev - I lack the ability to do so. i saw the option before, but it's not here anymore. you're a mod, right? can you do it? – CLASSIFIED Dec 08 '11 at 19:25
  • Try this: http://chat.stackoverflow.com/rooms/5696/swapping-variables-c-processor-level – Kev Dec 08 '11 at 19:34

2 Answers2

3

I suggest that you read the first parts of Intel's optimization manual. Then you will realize that a modern, out-of-order, speculative CPU does not even respect your assembly language. Manipulating pipeline to your advantage? Based on this document, I'd say -- forget it.

zvrba
  • 24,186
  • 3
  • 55
  • 65
  • you're right - might not be the most efficient way to get it done at all - however it still respects instructions which translate into individual microops. I'm still gonna try to get it in there. – CLASSIFIED Dec 09 '11 at 18:44
0

This would depend on which CPU you're targeting, and which compiler. You don't specify either.

In general, a CPU will go to great lengths to pretend that everything is in-order-executed, even when in reality it's superscalar behind the scenes. Code that tries to take advantage of hazards doesn't break, but instead it executes more slowly as the CPU will wait for the hazard to clear before continuing. Otherwise, almost all code would fail on future generations of the CPU as superscalar behavior increases.

In general, unless you're on a very specialized architecture and you have complete assembly-level control of execution, you will not be able to go anywhere with this idea.

StilesCrisis
  • 15,972
  • 4
  • 39
  • 62