0

Suppose that I have a boolean function isCorrect(Set<Integer>).
The parameter of the function is calculated by another function buildSet().

Which one is better in terms of both time and space efficiency?

Set<Integer> set = buildSet();
if(isCorrect(set))
    doSomethingWith(set);

or

if(isCorrect(buildSet()))
    doSomethingWith(buildSet());
padawan
  • 1,295
  • 1
  • 17
  • 45
  • If you're only referring to the value once then there is no difference -- they will generate essentially the same bytecode and perform equally fast. If you will need to refer to the value a second time then the first is technically "better", but sometimes the code is clearer with something like the second (or sometimes the first is clearer). You have to exercise judgment, I'm afraid. – Hot Licks May 17 '14 at 14:42
  • An important point is that the first approach is easier to test, since the result from `buildSet` can be easily examined in the debugger. For this reason I tend to prefer the first, even for a single access, if the value being retrieved is not reasonably predictable or the called method has not been well-tested. – Hot Licks May 17 '14 at 14:46

4 Answers4

4

The first approach is better, and I don't think this is a matter of opinion. Don't call the same function twice wastefully when you already have its result. Of course, I'm assuming buildSet() doesn't have any necessary side effects.

Which one is better in terms of both time and space efficiency?

In terms of time, you're building the set once in the first snippet and twice in the second snippet, so presumably the second would take longer. In terms of space, there likely will not be a difference. However, you seem to be instantiating two objects in your second snippet and just one in your first (again, I can't be sure of this because I don't know how buildSet() is implemented). If this is the case and you're retaining both of those objects, then the second snippet will use twice the space as well.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • If `buildSet` is just a getter then it would likely be inlined by the JITC and there would be no difference. If it actually constructs the Set, of course, then repeating the call should be avoided, as creating an object is generally pretty expensive. – Hot Licks May 17 '14 at 14:44
  • 2
    @HotLicks If `buildSet()` is a getter, it's incredibly poorly (*misleadingly*, even) named. – Gareth Latty May 17 '14 at 14:44
  • @HotLicks Yes, although presumably `buildSet()` is doing some "building". – arshajii May 17 '14 at 14:45
  • 1
    @Lattyware - That would be the first poorly-chosen name to ever appear in SO. – Hot Licks May 17 '14 at 14:47
  • @HotLicks I'm not saying it wasn't worth mentioning, just that if the user was in that case they should name their methods accordingly. – Gareth Latty May 17 '14 at 14:51
1

The answer is - it depends. See below:

  • If function call is time-consuming, you definenetely should store the result;
  • Storing the result might make the code a little bit less readable, although readability is not an objective metric;
  • Sometimes you may want to be sure, that first and second usage deals with exactly the same object. When you call function twice, you may get different results. This situation is usually called race condition or data race and in most cases it affects program correctness.

So, to summarize: in most cases it makes sense to store the result. Sometimes (but IMO not too often) it is not really neccesary.

Alexey Malev
  • 6,408
  • 4
  • 34
  • 52
  • 1
    Actually, I find the first snippet *more* readable, maybe that's just me though. – arshajii May 17 '14 at 14:31
  • @arshajii That's why I wrote that it is really subjective :) BTW, I agree with you about that particular snippet. – Alexey Malev May 17 '14 at 14:31
  • If you find the latter more readable, that readability can probably be retained by good naming of the local variable (e.g: `builtSet` - with a more realistic example it would be better, of course). – Gareth Latty May 17 '14 at 14:48
1

While the existing answers give good reasons why storing the value is best, they miss out on what I feel is an important one (in fact, the most important one): in your second example (running the function twice), you introduce a potential race condition.

If buildSet() relies on outside factors (which is highly probable in any non-trivial function - and could become true with later changes), there is a chance that the value changes between the if check and the second call. This could create a subtle and hard to find bug, which would potentially only become visible when you made changes elsewhere, or certain events happened with specific timings.

This is, by itself, a good reason to avoid such a pattern.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
1

By going from the second example (two calls) to the first (one call), you will save the time that the second call to buildSet is on the stack. If that call is on the stack 10% of the time, then your speedup will be a factor of 100/90 = 1.11, or 11%. If it is on the stack 50% of the time, the speedup will be a factor of 100/50 = 2, or 100%.

How do you know what fraction of time a function call is on the stack?
It is inclusive wall-clock percent by line.

Not every profiler will tell you this.

  • If it only tells you "self time", not inclusive time, that will not tell you anything about the time spent in a function call.
  • If it only tells you by function, not by line, you can't tell if the second call was the problem (as opposed to the first).
  • If it is only a "CPU-profiler", then if there is any I/O, sleep, or lock wait in the buildSet function or elsewhere in the app, the profiler will act like it doesn't exist.
  • If it does not tell you percent, but only milliseconds or call counts, then you gotta do the math to figure out what the percent of total time in the call is.
  • Call graphs don't tell you, "flame graphs" don't tell you, timelines don't tell you, etc. etc.

One that does tell you is Zoom. Others may, if you can figure out how to tell them what to do. The method I and many people use is random pausing.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135