2

I stumble upon this cool sorting algorithm with O(n) time complexity on GeeksforGeeks - Sort string of characters and I was trying to refactor the code to use Java Streams on the nested for loops instead and collect the result in a string variable but I can't seem to wrap my head around it.

Here's the original code:

// Java program to sort
// a string of characters
public class SortString{
    static final int MAX_CHAR = 26;
 
    // function to print string in sorted order
    static void sortString(String str) {
 
        // Hash array to keep count of characters.
        int letters[] = new int[MAX_CHAR];
 
        // Traverse string and increment
        // count of characters
        for (char x : str.toCharArray()) {
 
            // 'a'-'a' will be 0, 'b'-'a' will be 1,
            // so for location of character in count
            // array we will do str[i]-'a'.
            letters[x - 'a']++;
        }
        
        // HOW TO CONVERT THIS TO JAVA STREAM?  
        // Traverse the hash array and print
        // characters
        for (int i = 0; i < MAX_CHAR; i++) {
            for (int j = 0; j < letters[i]; j++) {
                System.out.print((char) (i + 'a'));
            }
        }
    }
 
    // Driver program to test above function
    public static void main(String[] args) {
        sortString("geeksforgeeks");
    }
}
// This code is contributed
// by Sinuhe
ColstonBod-oy
  • 63
  • 1
  • 6

5 Answers5

2

We can sort the given string, which is expected to contain only lower case English letters, in O(n) time with streams only (without using a precalculated array) by making use of the LinkedHashMap, which would be used internally inside the collector.

We can create a custom collector which will accumulate stream elements into a LinkedHashMap by using static method Collector.of(). After processing the whole string, a LinkedHashMap will contain a frequency of every lower case English letter in the given string and finisher function of the collector will turn this map into the resulting string.

public static void main(String[] args) {
    String str = "aewwawaeaea";

    String result = str.chars()
        .boxed()
        .collect(Collector.of(
            () -> IntStream.rangeClosed('a', 'z')
                .collect(LinkedHashMap::new, (map, next) -> map.put(next, 0), LinkedHashMap::putAll),
            (LinkedHashMap<Integer, Integer> map, Integer next) -> map.merge(next, 1, Integer::sum),
            (left, right) -> {
                right.forEach((k, v) -> left.merge(k, v, Integer::sum));
                return left;
            },
            map -> map.entrySet().stream()
                .map(entry -> Character.toString(entry.getKey()).repeat(entry.getValue()))
                .collect(Collectors.joining())
        ));
    
    System.out.println(result);
}

Output:

aaaaaeeewww

As it has been suggested by @Holger in the comments, we can build a collector on based on the int array of 26 elements.

The collector is more concise and performant than the previous one:

public static void main(String[] args) {
    String str = "aewwawaeaea";
    String result = str.chars()
        .boxed()
        .collect(Collector.of(
            () -> new int['z' - 'a' + 1],
            (int[] arr, Integer next) -> arr[next - 'a']++,
            (left, right) -> {
                Arrays.setAll(left, i -> left[i] + right[i]);
                return left;
            },
            arr -> IntStream.range(0, arr.length)
                .mapToObj(i -> Character.toString(i + 'a').repeat(arr[i]))
                .collect(Collectors.joining())
        ));
    
    System.out.println(result);
}

Output:

aaaaaeeewww
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
1

You can use IntStream to convert the nested loop to a stream, like this:

import java.util.*;
import java.util.stream.IntStream;

public class ABC{
  static final int MAX_CHAR = 26;
  static void sortString(String str) {
    int letters[] = new int[MAX_CHAR];
    for (char x : str.toCharArray()) {
      letters[x - 'a']++;
    }
    IntStream.range(0, letters.length).forEach(i -> {
      IntStream.range(0, letters[i]).forEach(ch -> {
        System.out.print((char) (i + 'a'));
      });
    });
  }

  // Driver program to test above function
  public static void main(String[] args) {
    sortString("geeksforgeeks");
  }
}

Working link.

Charchit Kapoor
  • 8,934
  • 2
  • 8
  • 24
1

The best snippet that doesn't require precalculated array I am able to produce is this:

final String result = str.chars()
    .boxed()
    .collect(
        Collectors.collectingAndThen(
            Collectors.groupingBy(
                ch -> ch - 'a',
                TreeMap::new,
                Collectors.counting()),
            map -> map.entrySet().stream().collect(
                StringBuilder::new,
                (sb, e) -> sb.append(Character.toString(e.getKey() + 'a')
                                              .repeat(e.getValue().intValue())),
                StringBuilder::append)
        )
    ).toString();
  • The inner groupingBy groups the characters to the position in an alphabet (key) and the count (value). The keys for the count is 0 are omitted (they don't get into the map at all). The output after this part looks like this:

    {4=4, 5=1, 6=2, 10=2, 14=1, 17=1, 18=2}
    

    Note I used TreeMap to keep the characters sorted.

  • The collectingAndThen wraps the Map<Integer, Long> and using a mutable reduction in another stream to resull in StringBuilder with the correct position and amount of the characters (note I use String#repeat as of Java 11 that can be backported or easily substituted by a custom/utility method).

Conclusion:

  • Is this solution in the declarative spirit of ? Yes, it is.
  • Is this solution better? Nope. It is not even more readable or nice piece of code to look at. It is ok for sake of getting more familiar with streams.
  • Is this solution more efficient? Nope, it is not 0(n) anymore. Remember it is not always possible to represent iterative for loop(s) directly by a declarative approach using Stream API and achieve the same performance.
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • It is not `O(n)`. –  Jul 11 '22 at 05:18
  • 1
    I didn't declare the solution is `O(n)` (I explicitly add this info for sure). The solution is, though, in the declarative spirit of Stream API. – Nikolas Charalambidis Jul 11 '22 at 05:18
  • @英語は苦手 Well, technically this solution has a time complexity of `O(n)`, because the map will contain at most `26` entries and therefore adding new entry into the map (or updating existing entry) in the worst case scenario would be proportional to `log(26)` which should be considered `O(1)`, **not** `O(log n)`. Which gives the overall time complexity of `O(n)`. We can lower the cost of updating the entries by making use of `LinkedHashMap`. – Alexander Ivanchenko Jul 11 '22 at 09:48
  • 1
    @AlexanderIvanchenko when you use a `LinkedHashMap`, the result will not be sorted anymore. What you could do, is use an array instead of a map inside the collector, just like with the original approach. – Holger Jul 11 '22 at 10:01
  • @Holger I meant a `LinkedHashMap` prepopulated with `26` entries, all having initial `0` values. That would guarantee that the result would be sorted. – Alexander Ivanchenko Jul 11 '22 at 10:04
  • 1
    @AlexanderIvanchenko but the idea was not to have a prepopulated data structure. As said, if you go that route, a plain array would be even simpler and more efficient. A collector around an array is not worse than a collector around a map. – Holger Jul 11 '22 at 10:07
  • @Holger *plain array would be even simpler and more efficient* - Agree. – Alexander Ivanchenko Jul 11 '22 at 10:08
1

Try this.

static final int MAX_CHAR = 26;

static String sortString(String str) {
    int[] letters = new int[MAX_CHAR];
    str.chars().forEach(i -> letters[i - 'a']++);
    return IntStream.range(0, MAX_CHAR)
        .mapToObj(i -> Character.toString(i + 'a').repeat(letters[i]))
        .collect(Collectors.joining());
}

public static void main(String[] args) {
    System.out.println(sortString("geeksforgeeks"));
}

output:

eeeefggkkorss
  • 2
    Though this is a good idea, you still need the predefined `letters` array. – Nikolas Charalambidis Jul 11 '22 at 05:15
  • The line `str.chars().forEach(i -> letters[i - 'a']++)` has no advantage over the first `for` loop in the original solution and violates guidelines of the [*Stream API documentation*](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/package-summary.html#SideEffects) in regard to side-effects. When there are other alternatives (which the case here), you should not resort to changing the state of objects outside the stream via side-effects. – Alexander Ivanchenko Jul 11 '22 at 10:01
  • 1
    @AlexanderIvanchenko the loop of the original solution uses `toCharArray()` which is avoided here. So there is a slight advantage. – Holger Jul 11 '22 at 10:11
  • @Holger Agree, generating char array from the string is redundant. Approach with traditional index based loop would be better `for (int i = 0; i < str.length(); i++) letters[str.charAt(i) - 'a']++;`. – Alexander Ivanchenko Jul 11 '22 at 10:17
  • 1
    @AlexanderIvanchenko on the other hand, you can just use `char[] array = str.toCharArray(); Arrays.sort(array); System.out.println(array);` the built-in method will fall back to counting sort for large arrays anyway. – Holger Jul 11 '22 at 10:29
  • @Holger *will fall back to **counting sort** for large arrays* - that a new observation for me, thanks! – Alexander Ivanchenko Jul 11 '22 at 10:33
  • 1
    @AlexanderIvanchenko the implementation notes about the used algorithms are misleading and demonstrate why authors should be avoid mentioning such implementation details in the documentation, as I already wrote in [this answer](https://stackoverflow.com/a/32334651/2711488) – Holger Jul 11 '22 at 10:37
1

For example (Java 11+):

String foo(String s) {
    return s.chars().boxed().collect(Collectors.collectingAndThen(
            Collectors.groupingBy(Function.identity(), Collectors.summingInt(ignore -> 1)),
            integerLongMap -> IntStream.rangeClosed('a', 'z')
                    .mapToObj(value -> Character.toString(value).repeat(integerLongMap.getOrDefault(value, 0)))
                    .collect(Collectors.joining())
    ));
}


System.out.println(foo("bbccdefbbaa"));  // aabbbbccdef
System.out.println(foo("geeksforgeeks"));  // eeeefggkkorss

But it's harder to read than iterative version, imho.

Arsegg
  • 126
  • 1
  • 10