How to count the number of computations in the loop?

Question

I'm reading this textbook:

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson).

Currently I am unsure about Problem 5.5 in the text book. Below, I have attached a link to the book to make it easier to find.

And this is the answer they give:

Could someone please tell me how the authors arrive at the answer for this question? I am not sure about how they count the number of computations (n additions and 2n multiplications).

Thank you in advance, any help is appreciated :)

View Textbook Online

The answer '*n* additions' has overlooked the addition `i++`, and the *effective* summation (subtraction) needed by `i <= degree`. — Weather Vane, Jun 26 '21 at 14:28
Think through execution as the computer will. Lines 7 and 8 do 2 floating point multiplications and one addition. The loop runs n times. The second question relies on information in the text that you haven't provided. @WeatherVane is correct; older texts will normally do this. Integer addition was very much faster than floating point, so ignored. By 2016 that wasn't a safe assumption, though it's reasonable to ignore for other reasons. Also note that this is a terrible way to implement polynomial evaluation. Look up Horner's Rule for a better one. Or maybe that's the next problem :) — Gene, Jun 26 '21 at 14:37
Still, the text states that a floating point addition is only 3 clock cycles. The minimum 1 cycle needed for each integer addition is significant. — Weather Vane, Jun 26 '21 at 14:41
@Gene I see it now, thank you for the explanation! :) Yep Horner is the next problem. Much appreciated :) — Megan Darcy, Jun 26 '21 at 14:58
@WeatherVane I've been told this tb has some mistakes so this could be one of them. Thanks for pointing it out though! Will keep it in mind. much appreciated :) — Megan Darcy, Jun 26 '21 at 14:59
@WeatherVane: FP add has 3 cycle *latency*, FP mul has 5 cycle *latency*. Integer add is not part of the `xpwr*=x` critical path dependency chain, so not it's not significant. You don't just add cycles. (And based on Megan's previous questions, the machine model they're using is Haswell-like, so overall throughput isn't significant either. This is such a latency bottleneck we don't really need to look at front-end or back-end throughput). So yes the total number of additions should include `i++`, because they didn't say "FP additions", but no it's not significant for performance. — Peter Cordes, Jun 26 '21 at 15:22
@MeganDarcy: The *global edition* of CS:APP 3rd edition has many serious errors in the asm practice problems, since the publishers hired some incompetent people to write different problems, without checking with the textbook authors. See https://csapp.cs.cmu.edu/3e/errata.html and [CS:APP example uses idivq with two operands?](https://stackoverflow.com/q/57998998) . The non-global edition doesn't have these errors so the "problems" are usable. — Peter Cordes, Jun 26 '21 at 15:25
@PeterCordes thank you for your answer haha it really cleared things up! this whole topic is confusing and doesn't help that the book is filled with errors :( I will try to get my hands on the non global version too. :) — Megan Darcy, Jun 26 '21 at 15:54
BTW, the only problem I see with this exercise from the book is lack of precision between FP addition vs. addition in general. All the rest of their analysis is pretty sane. So maybe the people the publishers hired knew something about CPU architecture, but not much about x86-64 assembly. Still, it's pretty much inexcusable to present something as compiler output when it contains impossible instructions, so I wouldn't trust the quality of their work anywhere. It might be just luck that they got things right on this problem. — Peter Cordes, Jun 27 '21 at 15:25
Related Q&A about the same practice problem: [Latency bounds and throughput bounds for processors for operations that must occur in sequence](https://stackoverflow.com/q/63095394) — Peter Cordes, Jan 13 '23 at 13:17

hsuecu · Answer 1 · 2021-09-27T11:44:07.073

1

result += .... is done n times.

and on rhs of line 7 and 8 there are n+n multiplications. It is clearly visible. What is that matter of confusion

edited Sep 27 '21 at 11:44

answered Jun 26 '21 at 14:42

hsuecu

107
8

1

That's the number of *FP* additions and multiplications. (And yes that seems obvious to me, too) However, there's also an integer addition in `i++`. It's not clear from the question if that's the only source of confusion or not. It's pretty clear from context that the only operations we really need to worry about for performance are the FP operations, and specifically their latency. – Peter Cordes Jun 26 '21 at 15:57
the loop operation is the defining factor. it is not relevent to focus on it – hsuecu Jun 27 '21 at 10:57
@PeterCordes the i++ thingy which you mentioned is the part of the loop , not the part of the intended mathematical stuff which you want to do. If you want to count the overhead of i++ which in assembly is just "INC %ECX", you can do it. To make observation over time complexity we just need a rough overview not the neck-to-neck measurment. – hsuecu Sep 27 '21 at 11:40

How to count the number of computations in the loop?

1 Answers1

Linked