Which solution is better, faster and readable?

Question

ok, problem: "An isogram is a word that has no repeating letters, consecutive or non-consecutive. Implement a function that determines whether a string that contains only letters is an isogram. Assume the empty string is an isogram. Ignore letter case." and I write 2 solutions

First solution:

Scanner scanner = new Scanner(System.in);
String string = scanner.next();
long start = System.currentTimeMillis();
char words[] = string.toCharArray();
boolean isIsogram=true;
for (int i=(words.length-1); i>=0; i--){
  for(int j=0; j<(i-1);j++){
    if(words[i]==words[j]){
      isIsogram=false;
    }
  }
}
long finish = System.currentTimeMillis();
System.out.println(isIsogram + " time:"+ (finish-start) );

Second solution:

    Scanner scanner = new Scanner(System.in);
    String string = scanner.next();
    long start = System.currentTimeMillis();
    boolean isIsogram = (string.length() == string.toLowerCase().chars().distinct().count());
    long finish = System.currentTimeMillis();
    System.out.println(isIsogram + " time:"+ (finish-start) );

I have tested both solutions and there is results: input: "asd"
1) true time 0
2) true time 113

and I want to know your ideas and opinion which solution is better? My teacher told me 2 solution is better, but 1 solution takes less time, and I am not sure which is better.....

Closely related: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) — Daniel Pryden, May 02 '19 at 20:16
Are the two solutions actually equivalent? I'm not seeing how the first one ignores case. — hatchet - done with SOverflow, May 02 '19 at 20:25
That function is untested (you have no unit tests) and it is not testable (because it uses `System.in` directly. No matter how readable it is, it's not maintainable. — Thomas Weller, May 02 '19 at 20:34
As for performance - while the second approach has rather big overhead, it will still perform in linear time (the most expensive operation is `distinct()` which however operates on a map so will take linear time). The first solution takes O(n^2) time. In other words, first may well be fast for short strings, but use an input 10000 characters long and the second will be much faster. — Jiri Tousek, May 02 '19 at 20:54

Daniel Pryden · Answer 1 · 2019-05-02T20:42:51.830

The right answer is actually to profile your specific problem and see what works for your requirements.

Readability is important. What approach is most readable is very subjective. Stream-oriented operations usually attack the problem from a more declarative approach rather than an imperative approach. Declarative code is usually much easier to read, but imperative code is often faster.

But how fast do you need to be? Even your (very flawed) benchmark shows only a difference of 100 milliseconds. That's faster than the threshold of human perception. If your code isn't too slow, then don't worry about making it faster. Worry about making it clear, maintainable, debuggable, and correct first.

In any case, since this is a fun problem, I poked at it for a minute. You have 2¹⁶ possible char values in a String, so if you use a BitSet you can have a yes/no bit for each of them, and still fit the whole thing in 8K of memory.

Your question said to do case folding. That sounds like a simplification, but it's really not, unless your data is ASCII (in which case, you only need a 256-bit BitSet, or possibly only a 26-bit one!). If you can use the full range of Unicode characters, even the problem of reliably converting from upper case to lower case becomes almost impossible to do correctly. (Case conversion is ultimately locale-specific.)

So I'm going to assume you want to handle all possible char values, which won't handle UTF-16 surrogates (like you need for emoji) correctly, but should handle everything that's considered a "letter" in alphabetic languages.

Here's what I came up with:

static boolean isIsogram(String text) {
    java.util.BitSet bits = new java.util.BitSet(1 << 16);
    for (int i = 0; i < text.length; i++) {
        int ch = (int) text.charAt(i);
        if (bits.get(ch)) {
            return false;
        }
        bits.set(ch);
    }
    return true;
}

"this doesn't handle UTF-16 surrogates correctly" the spec said to ignore case, it didn't say to treat surrogates as equivalent. So I think you're good on that point. — mypetlion, May 02 '19 at 20:30
100 ms on `"asd"`. How long might it take for a string that is four times that long? Maybe already outside perception specs. You talk about benchmarking, but have you benchmarked that BitSet solution? — Thomas Weller, May 02 '19 at 20:57

score -2 · Accepted Answer · answered May 02 '19 at 20:29

A few things on readability:

First codeblock:

I would have the counters counting in the same direction-- it will still compare each word this way. It's not a terribly important change, but it can save the reader a step so that they don't have to do any mental math to determine if the code is producing the intended result, since the result is apparent (it's easy to see that the code's time complexity is O(n^2)).

boolean isIsogram = true; 
Scanner scanner = new Scanner(System.in);
String string = scanner.next();
long start = System.currentTimeMillis();
char words[] = string.toCharArray();

for (int i = 0 ; i <= words.length - 1; i++){
    for(int j = 0; j <= words.length - 1; j++){
         if(words[i] == words[j]){
             isIsogram = false;
             break; //I agree with the first answer
         }
    }
    if (!isIsogram)
        break; 
}  

long finish = System.currentTimeMillis();
System.out.println(isIsogram + " time:" + (finish-start) );

The second codeblock is quite readable, although I may be primed towards understanding the problem and so it might actually be more readable because of that. But the calls to compare distinct characters make complete sense in terms of the goal.

Which solution is better, faster and readable?

2 Answers2