I am doing the following programming exercise: Numericals of a String. The statement is:
You are given an input string.
For each symbol in the string if it's the first character occurence, replace it with a '1', else replace it with the amount of times you've already seen it...
But will your code be performant enough? Examples:
input = "Hello, World!" result = "1112111121311" input = "aaaaaaaaaaaa" result = "123456789101112"
There might be some non-ascii characters in the string.
Note: there will be no int domain overflow (character occurences will be less than 2 billion).
I have written the following answer:
import java.util.*;
import java.util.stream.*;
public class JomoPipi {
public static String numericals(String s) {
System.out.println("s: "+s);
Map<String, Long> ocurrences = Arrays.stream(s.split("")).
collect(Collectors.groupingBy(c -> c,
Collectors.counting()));
System.out.println("ocurrences: "+ocurrences.toString());
StringBuilder result = new StringBuilder();
for(int i = s.length()-1; i >= 0; i--){
String c = String.valueOf(s.charAt(i));
result.append(ocurrences.get(c) + " ");
ocurrences.put(c, ocurrences.get(c)-1);
}
System.out.println("result: "+result.toString());
String[] chars = result.toString().split(" ");
Collections.reverse(Arrays.asList(chars));
String sorted = String.join("",chars);
System.out.println("sorted: "+sorted);
return sorted;
}
}
However it times out (execution time is above 16000 ms), when input string is big.
To see how it works there is a trace with a very small input string:
s: Hello, World!
result: 1 1 3 1 2 1 1 1 1 2 1 1 1
sorted: 1112111121311
Besides, I have also written the following alternative answer:
import java.util.*;
import java.util.stream.*;
public class JomoPipi {
public static String numericals(String s) {
System.out.println("s: "+s);
Map<String, Long> ocurrences = Arrays.stream(s.split("")).
collect(Collectors.groupingBy(c -> c,
Collectors.counting()));
String[] result = new String[s.length()];
for(int i = s.length()-1; i >= 0; i--){
String c = String.valueOf(s.charAt(i));
result[i] = String.valueOf(ocurrences.get(c));
ocurrences.put(c, ocurrences.get(c)-1);
}
System.out.println("result: "+Arrays.toString(result));
return String.join("",result);
}
}
Even so, it stills timing out.
Here is a trace with a small input string:
s: Hello, World!
result: [1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 3, 1, 1]
How could we improve the solution? What algorithm would perform better enough to handle very big input strings? Where is the bottleneck we should debug and avoid, to improve this answer?
To try to solve it by myself I have read:
- How do I count the number of occurrences of a char in a String?
- Hashmap implementation to count the occurrences of each character
- How do I reverse an int array in Java?
- https://www.codewars.com/kata/5b4070144d7d8bbfe7000001/discuss
- HashMap to return default value for non-found keys?
- What is the difference between putIfAbsent and computeIfAbsent in Java 8 Map ?
EDIT: Here we have an answer based on @khelwood suggestion:
import java.util.*;
import java.util.stream.*;
public class JomoPipi {
public static String numericals/*->*/(String s) {
Map<String, Integer> ocurrences = new HashMap<String,Integer>();
StringBuilder result = new StringBuilder();
for(int i = 0; i < s.length(); i++){
String c = String.valueOf(s.charAt(i));
ocurrences.putIfAbsent(c, 0);
ocurrences.put(c,ocurrences.get(c)+1);
result.append(ocurrences.get(c));
}
return result.toString();
}
}