2

I am trying to sort lists within a list in alphabetical order using radix sort. I need the lists to be sorted by one of the attributes of an object I've created.

Note: I cannot use built-in sorts - I have to write my own. I was not allowed to use defaultdict so I used lists.

I have a list called results[]. In results[x] for all of results[] I have another list containing words of length x. Those words are stored as word objects containing the original word (originalWord), the alphabetized word (azWord), and its length (wLength). e.g. dog, dgo, 3.

Since I have many, many words, I've decided the radix sort would be most efficient for my purposes. I am relatively new to Python so I'm having some trouble writing the code correctly. I have a rough outline of what it should look like but I would appreciate help with fixing the syntax.

I am planning on using radix_sort in a for loop that iterates through results[]. I have a variable, maxL, that stores the longest word I have (aka the number of lists in results).

for x in range(0,maxL):
    radix_sort(results[x], x)

Here is my attempt at writing radix sort for strings. Please note that the azWord attribute is represented as a char list.

def radix_sort(List, length):
    buckets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
    for i in range (0, length-1):    #for every letter "column"
        for word in List:    #for every word 
            index = ord(word[i].azWord)-ord('a')   #get the index of the word
            buckets[index].append(word)     #add word object to correct bucket
    for containedList in buckets:
        while(containedList):
            #I'm having trouble here with emptying the lists back into the bins

EDIT: Also, since I don't want to run out of memory (this is being done for a very long list of words), should I be clearing some things as I go that I don't need?

Also, currently, Eclipse is giving me an error "Expected:: Expected::" for this line:

for i in range (0, length-1)

Current version:

def radix_sort(List, length):
    buckets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
    for i in range (length-1, -1, -1):    #for every letter "column"
        for word in List:    #for every word 
            index = ord(word.azWord[i])-ord('a')   #get the index of the word
            buckets[index].append(word)     #add word object to correct bucket
    List[:] = []
    for containedList in buckets:
        List.extend(containedList)
Michi
  • 125
  • 3
  • 11
  • Have you looked at an [example implementation](http://en.wikipedia.org/wiki/Radix_sort#Example_in_Python)? – Fred Larson Apr 03 '14 at 15:19
  • Yes but I didn't find it particularly useful for my case since I'm dealing with strings within objects that are in lists of lists. I'm getting very thrown off by the syntax and how to access all of the information. – Michi Apr 03 '14 at 15:22

2 Answers2

1

I think you're missing some colons on these lines:

for i in range (0, length-1)  # Need a colon here

for word in List  # Need a colon here

for containedList[] in buckets  # Need a colon here

while(containedList[])  # Need a colon here
Fred Larson
  • 60,987
  • 18
  • 112
  • 174
  • Could you rewrite your answer without those actual colons? Right now it sounds confusing and I spent a while thinking what colons are missing... – Dunno Apr 03 '14 at 15:23
  • Ah yes. Definitely. I've updated the original post. I'm still a little confused about emptying the buckets. – Michi Apr 03 '14 at 15:25
1

To put the sorted results back into the list:

List[:] = []
for containedList in buckets:
    List.extend(containedList)

One more thing, you'll need to sort from least significant to most significant if you expect the proper sort:

for i in range(length-1, -1, -1):

Note that your original range was incorrect anyway, the end point isn't included in the range so stopping at length-1 was going to skip the last letter.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I've updated my question with the "latest version". It seems to be working - thank you. I am trying to see if my sorting came out correctly, now. What should be the syntax if I wanted to add a print statement in my `for x in range(0,maxL)` loop? I want something along the lines of print(results[x].azWord) (to make sure they are indeed in alphabetical order and all words have been sorted). – Michi Apr 03 '14 at 15:36
  • @Michi, how about just `print results[x]`? I assume it's just for debugging purposes so it doesn't have to be pretty. – Mark Ransom Apr 03 '14 at 15:38
  • `print results[x]` just gives me hex, so it's not very useful. I actually do need it to be pretty and figure out how to access both the originalWord and azWord attributes in the next step of my program. Right now I want it to print both originalWord and azWord for all the word objects in results[] (post radix sort). – Michi Apr 03 '14 at 15:43
  • If you're getting hex then the code you posted isn't representative of your actual code. I think you're on your own here. – Mark Ransom Apr 03 '14 at 15:45
  • Isn't that what's supposed to happen? I'm trying to print objects, not strings. – Michi Apr 03 '14 at 15:47
  • @Michi sorry you're right. Add a `__repr__` method to your class to make it show the contents. See e.g. http://stackoverflow.com/questions/3691101/what-is-the-purpose-of-str-and-repr-in-python – Mark Ransom Apr 03 '14 at 15:53
  • I added `__repr__` but I still don't know how to use it since I can only do it from an object and the results[x] is a list of objects. How do I access that specific list? – Michi Apr 03 '14 at 16:21
  • @Michi when you print a list you'll get the `__repr__` string from each object in the list. If you need anything more than that I think it's time for a new question. – Mark Ransom Apr 03 '14 at 16:27