First, convert your input string into a map that maps characters onto a sorted list of indices. For example:
String input = "abcnc";
var store = new HashMap<Character, TreeSet<Integer>>();
for (int i = 0; i < input.length(); i++) {
store.computeIfAbsent(input.charAt(i), k -> new TreeSet<Integer>()).add(i);
}
System.out.println(store);
This makes: {a=[0], b=[1], c=[2, 4], n=[3]}
And the cost for making this thing can be written off as 'preprocessing', but if it matters, O(nlogn).
Armed with store
, you can do the job in O(log n)
time. For example, if I want to know how many c
s are in the 3-5 range, I for ask for the TreeSet match c (getting me the [2, 4]
treeset). Then I can use treeset's headSet and tailSet methods to figure it out, which are each O(logn) operations.
This gives me a total runtime of O(logn)
which is as near to O(1)
as makes it irrelevant (in the sense that practical concerns about how modern computer architecture works will dwarf this). If an interviewer does not accept that answer, the are either needlessly pedantic or wildly misguided about how modern computers work, so now we delve into a purely academic exercise to knock it down to an O(1)
.
O(1)
instead of mapping the character to a TreeSet, we instead map it to an int[]
. This int array is as large as the entire input (so in this case, the 4 int[] arrays for keys 'a', 'b', 'c', and 'n' are all 5 large because the input is 5 large). This int array answers the question: If you asked me for the answer from THIS position to the end of the string, what is the correct answer? So, for c it would be: [2, 2, 2, 1, 1]. Note that the last number (0) is missing as we don't need it (the amount of Xs from the end of the string to the end of the string is.. of course, 0, always, regardless of what character we are talkkng about). Had the string input been abcnca, then the int arrays are 6 large and for c, would contain [2, 2, 2, 1, 1, 0].
Armed with this array, the answer can be provided in O(1)
time: It's 'the answer I would give if you asked me to go from start index to the end-of-string', minus 'the answer I would give if you asked me to go from end index to the end-of-string'. Taking into account, of course, that if the question's end index matches end-of-string, just look up the answer in the int array (don't subtract anything).
This means the time taken after preprocessing is O(1), but the size of the 'preprocessed' data structure is now considerable. For example, if the input string is 1 MB large and contains 5000 different characters, it's a 20GB table (4 bytes per number, 5000 entries in the map, each entry mapping to an array with a million entries, at 4 bytes a pop, is 5000*1000000*4 = 20GB
).