As I mentioned in my comments:
In your code the lines:
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
Always uses the first letter of the word once col is greater than the word length. This
causes shorter words to be sorted based upon their first letter once
col is greater than the word length. For instance ['aa', 'a'] remains
unchanged since on the for col loop we compare the 'a' in both words,
which keeps the results unchanged.
Code Correction
Note: Attempted to minimize changes to your original code
def count_sort_letters(array, size, col, base, max_len):
""" Helper routine for performing a count sort based upon column col """
output = [0] * size
count = [0] * (base + 1) # One addition cell to account for dummy letter
min_base = ord('a') - 1 # subtract one too allow for dummy character
for item in array: # generate Counts
# get column letter if within string, else use dummy position of 0
letter = ord(item[col]) - min_base if col < len(item) else 0
count[letter] += 1
for i in range(len(count)-1): # Accumulate counts
count[i + 1] += count[i]
for item in reversed(array):
# Get index of current letter of item at index col in count array
letter = ord(item[col]) - min_base if col < len(item) else 0
output[count[letter] - 1] = item
count[letter] -= 1
return output
def radix_sort_letters(array, max_col = None):
""" Main sorting routine """
if not max_col:
max_col = len(max(array, key = len)) # edit to max length
for col in range(max_col-1, -1, -1): # max_len-1, max_len-2, ...0
array = count_sort_letters(array, len(array), col, 26, max_col)
return array
lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))
Test
lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))
# Compare to Python sort
print(radix_sort_letters(lst)==sorted(lst))
Output
['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']
True
Explanation
Counting Sort is a stable sort meaning:
Let's walk through an example of how the function works.
Let's sort: ['ac', 'xb', 'ab']
We walk through each character of each list in reverse order.
Iteration 0:
Key is last character in list (i.e. index -1):
keys are ['c','b', 'b'] (last characters of 'ac', 'xb', and 'ab'
Peforming a counting sort on these keys we get ['b', 'b', 'c']
This causes the corresponding words for these keys to be placed in
the order: ['xb', 'ab', 'ac']
Entries 'xb' and 'ab' have equal keys (value 'b') so they maintain their
order of 'xb' followed by 'ab' of the original list
(since counting sort is a stable sort)
Iteration 1:
Key is next to last character (i.e. index -2):
Keys are ['x', 'a', 'a'] (corresponding to list ['xb', 'ab', 'ac'])
Counting Sort produces the order ['a', 'a', 'a']
which causes the corresponding words to be placed in the order
['ab', 'ac', 'xb'] and we are done.
Original Software Error--your code originally went left to right through the strings rather than right to left. We need to go right to left since we want to sort our last sort to be based upon the first character, the next to last to be based upon the 2nd character, etc.
Different Length Strings-the example above was with equal-length strings.
The previous example was simplified assuming equal length strings. Now let's try unequal lengths strings such as:
['ac', 'a', 'ab']
This immediately presents a problem since the words don't have equal lengths we can't choose a letter each time.
We can fix by padding each word with a dummy character such as '*' to get:
['ac', 'a*', 'ab']
Iteration 0: keys are last character in each word, so: ['c', '*', 'b']
The understanding is that the dummy character is less than all other
characters, so the sort order will be:
['*', 'b', 'c'] causing the related words to be sorted in the order
['a*', 'ab', 'ac']
Iteration 1: keys are next to last character in each word so: ['a', 'a', 'a']
Since the keys are all equal counting sort won't change the order so we keep
['a*', 'ab', 'ac']
Removing the dummy character from each string (if any) we end up with:
['a', 'ab', 'ac']
The idea behind get_index is to mimic the behavior of padding strings without
actual padding (i.e. padding is extra work). Thus, based upon the index
it evaluates if the index points to the padded or unpadded portion of the string
and returns an appropriate index into the counting array for counting.