Why would this algorithm not adhere to its specification? How to attempt proving its correctness?

Question

Write the method getLargeAndSmallLetters that returns a String that contains all upper- and lowercase ASCII letters in the following form (upper- and lowercase letters separated by white space, letter groups separated by comma, no comma at the end):

A a, B b, C c, D d, E e, ..., Z z

The specification of the algorithm does not bear any details about implementation specifics, i.e. the use of a StringBuilder construct, efficiency guidelines the algorithm has to meet or any best practices regarding readability of the implementation.

My version of the algorithm that compiles and returns the desired output

public static String getLargeAndSmallLetters() {
    StringBuilder builder = new StringBuilder();
    for (int i = 64; i < 96; ++i) { // WRONG: hard coded
        if (Character.isLetter(i)) {
            String upper = Character.toString((char)i);
            String lower = Character.toString((char)(i+32));
            builder.append(upper + " " + lower + ", "); // WRONG: in loop
        }
    }
    String result = builder.toString();
    result = result.substring(0, result.length() - 2);
    return result;
 }

What my docent probably would have liked to see is a for loop like this

for (char lower = 'a'; lower <= 'z'; ++lower) {
    Character upper = Character.toUpperCase(lower);
    builder.append(upper)
           .append(" ")
           .append(lower)
           .append(", ");
}

Why would the algorithm be "academically" more correct this way? This could have been solved with simple string concatenation, too.

I first want to understand why my solution is not optimal (but correct IMO). Having a C# background and can only roughly guess reasons as of why my code is not optimal:

for loop goes through too many iterations. I correct this in an "ugly" but correct way with if (Character.isLetter(i)).
it's not as readable
it casts ints to chars. Is there happening any kind of boxing/unboxing that impacts performance?
the way String is implemented in Java, am I creating more immutable Strings inside the string pool by using the string concatenation with the + operator inside my for loop? Would using StringBuilder be more efficient because it internally reuses the " " and ", " Strings more efficiently?

Now what I also would like to do is to provide an "academical argument" that proves the correctness of my answer. Here some ideas:

The compiler optimizes my "mistakes" away and actually generates the same byte code for both the "wrong" and "correct" answer
The JIT compiler executes the same machine code for both "correct" and "wrong" answer implementations
Provide a mathematical proof of correctness of the algorithm

Option 3 seems the most straight forward way of proofing my answers correctness to me.

For options 1 and 2 I think I'd need to correct the for loop to for (int i = 65; i < 91; ++i) and remove the if (Character.isLetter(i)) statement.

Any suggestions, additions, corrections to my thoughts would be greatly appreciated.

Sorry, this site is about code. I'm not reading all this to tell you whether or not you are right or wrong. Contact your professor. — duffymo, Nov 28 '17 at 20:42
@duffymo It's arguably on topic on account of being about algorithms. Still too much text in my opinion, though. — EJoshuaS - Stand with Ukraine, Nov 28 '17 at 20:46
Possibly related: [What is a magic number, and why is it bad?](https://stackoverflow.com/q/47882). About `sb.append("x"+y);` to execute `"x"+y` Java will need to generate code like `new StringBuilder("x").append(y).toString()` which means you are creating additional unnecessary StringBuilder. It will be garbage collected later, but it is code-smell which we may easily avoid by `sb.append("x").append(y)`. — Pshemo, Nov 28 '17 at 20:53
"for (int i = 64; i < 96; ++i) { // WRONG: hard coded" Well, `for (char lower = 'a'; lower <= 'z'; ++lower) { // not hard coded` *is* hard-coded too; it's just more clear what the hard-coded values mean in the latter case. — Andy Turner, Nov 28 '17 at 20:53
"`builder.append(upper + " " + lower + ", "); // WRONG: in loop`" The whole point of `StringBuilder` is you avoid constructing unnecessary strings. If you're concatenating parts of a string, then appending it to append it to a `StringBuilder`, you're doing it inefficiently. — Andy Turner, Nov 28 '17 at 20:54
I honestly don't understand why this gets downvoted. Probably because I put my question into *too* much context? I put quite some effort into elaborating this and in the process probably meshed too many questions into one. It's certainly not about *me* being right or wrong. I'm asking about the correctness of an algorithm. Saying this isn't about code without "reading all this" is quite ridiculous. — Michael Schnerring, Nov 28 '17 at 20:56
Well ... In the "real world" you also have to write code that conforms to your company / department / project's coding standards. And sometimes those coding standards may seem unnecessary or even limiting. Sounds like you have a professor that is enforcing his/her own standards and grading your work accordingly. My advice is to learn what he/she wants and adapt. If you don't like it (with good reason) you can forget it when the class is over. This is a useful skill in the real world too. — MFisherKDX, Nov 28 '17 at 21:09
@duffymo removed the unnecessary context of the question, hope this is not too much to read now — Michael Schnerring, Nov 28 '17 at 21:13
I still think you should go over it with your professor, not here. — duffymo, Nov 28 '17 at 21:16
@duffymo I don't see how the question as is has any relevancy in regards to my professor. The question addresses two implementations of an algorithm and how to prove that my version is as correct as the other one is. — Michael Schnerring, Nov 28 '17 at 21:29

Andy Turner · Accepted Answer · 2017-11-28T21:52:30.787

The compiler optimizes my "mistakes" away and actually generates the same byte code for both the "wrong" and "correct" answer

No, the compiler does very little by way of optimization. It might change some of your string concatenations into StringBuilder appends, but that's about it; it otherwise just faithfully reproduces what you've written in your code, in bytecode form. Basically all of the smartness happens in the JIT.

You can see very easily that they don't produce the same bytecode: decompile them with javap -c <YourClass>.

The JIT compiler executes the same machine code for both "correct" and "wrong" answer implementations

The JIT does execute the code, eventually. For something like this, it will probably execute once, and so not be worth JIT'ing.

But I strongly doubt that the JIT would produce the same code for both approaches, because the two approaches are quite different.

Provide a mathematical proof of correctness of the algorithm

Formal proof is hard in Java. There are an awful lot of subtle semantics that you have to take into account in the proof; or, at least, prove that those semantics aren't involved in the execution of your code.

All you can really do is assert that it is functionally equivalent. As in, not prove that it's correct, but demonstrate that it's not incorrect.

It's pretty easy to write a test for this, as you know the correct output:

A a, B b, C c, ...

So just check that your code produces that, and you're done. Or, of course, check that your code produces the same output as the model answer.

Your answer doesn't actually quite do the same as the model answer. You chop off the , from the end; the model answer does not.

Sorry if this wasn't clear, the model answer chops the comma off, too. This is also stated in the original question "upper- and lowercase letters separated by white space, letter groups separated by comma, no comma at the end" — Michael Schnerring, Nov 30 '17 at 08:59

score 1 · Answer 2 · answered Nov 28 '17 at 21:56

You are fighting a losing battle. If your "docent" can't be bothered to check whether your code produces the right output, he won't be bothered to read a proof of your algorithm's correctness. If you learn one thing from this, it's that you will often have to suffer fools, so get good at it.

That said, a proof of the correctness of your code could be had by showing (a) it accepts no input (b) it is not affected by side-effects (more realistically, not by ones not also affecting your docent's solution) and then conclude by showing (c) that a real execution trace of your algorithm results in the correct output (correct in that it produces the same thing as your docent's).

Given that, there is not a good reason to prefer one implementation to the other. If you prefer the model answer for performance, brevity or elegance, there's a better and simpler solution to this problem than either you or your docent found:

public static String getLargeAndSmallLetters() {
    return "A a, B b, C c, D d, E e, F f, G g, H h, I i, J j, K k, L l, M m, N n, O o, P p, Q q, R r, S s, T t, U u, V v, W w, X x, Y y, Z z";
}

You can make it a bit more readable like this:

public static String getLargeAndSmallLetters() {
    return String.format("%s%s",
        "A a, B b, C c, D d, E e, F f, G g, H h, I i, J j, K k, L l, M m, ",
        "N n, O o, P p, Q q, R r, S s, T t, U u, V v, W w, X x, Y y, Z z");
}

Note: for-looping through chars is terrible and you shouldn't do it. I can't say I've had a colleague try that in his code but if that day should come I will be thankful for the peer code review.

Why bother using `String.format`? Just write it as a concatenation. Compile-time constant string concatenations produce a single string in the string pool. (Actually, it's not even a concatenation: it's a single string literal). — Andy Turner, Nov 28 '17 at 22:35
"Given that, there is not a good reason to prefer one implementation to the other" I disagree. Speed is not the primary criterion (often), but readability is (more often); it just so happens that the model answer - at least with regard to looping between `'a'` and `'z'` - is both more readable (because it doesn't have magic numbers) *and* faster (because the `isLetter` check becomes unnecessary, and it loops fewer times). — Andy Turner, Nov 28 '17 at 22:45
@AndyTurner (1) Indeed, infinitely many equivalent algorithms return the same constant string. (2) I would (and did) strongly dispute the idea that the model answer is "good". I do concede that it is possible that one might prefer one algorithm to another for reasons other than correctness, despite not personally thinking either of these algorithms is particularly readable. — Patrick87, Nov 29 '17 at 01:19

Why would this algorithm not adhere to its specification? How to attempt proving its correctness?

2 Answers2