0

I am measuring time of execution of Radix and Counting sort using timeit module. I am using 100 sets of random integers lying on the interval <0; 1000000>. All integers are unique within the set. First set consists of 10000 integers, last set 1000000 integers. Each set is ten times sorted and an average time is recorded (as a full time/10). In log file of Radix sort there are some strange results and I am not sure if it is a problem of timeit module or my sorting algorithm:

Radix sort log

integers count, average time
......,.............
760000,1.51444417528
770000,1.31519716697
780000,1.33663102559
790000,1.3484539343
800000,1.37114722616
810000,1.61706798722
820000,1.4034960851
830000,1.65582925635
840000,1.68017826977
850000,1.69828582262
860000,1.47601140561
870000,1.73875506661
880000,1.75641094733
890000,1.54894320189
900000,1.80121665926
910000,1.56070168632
920000,1.8451221867
930000,1.8612749805
940000,1.61202779665
950000,1.63757506657
960000,1.64939744866
970000,1.66534313097
980000,1.68155078196
990000,1.69781920007
1000000,2.00389959994

You can see that sorting of larger set than previous takes sometimes less time. In case of Counting Sort is time increasing normally.

Here is my code for Radix Sort:

from __future__ import division

def sortIntegerList (listToSort, base):
    maxkey = len(str(max(listToSort)))

    for i in range(maxkey):
        bucketList = [[] for x in range(base)]

        for number in listToSort:
            bucketList[(number//base**i) % base].append(number)

        listToSort = []

        for l in bucketList:
            listToSort.extend(l)

    return listToSort

Here is my code for Counting Sort:

def sortIntegerList (listToSort):
    maxkey = max(listToSort)
    countingList = [0 for x in range(maxkey + 1)]

    for i in listToSort:
        countingList[i] += 1

    for i in range(1, len(countingList)):
        countingList[i] += countingList[i-1]

    sortedList = [0 for x in range(len(listToSort) + 1)]

    for i in listToSort:
        sortedList[countingList[i]] = i
        countingList[i] -= 1

    del sortedList[0]
    return sortedList

Here is code for measuring the time of execution:

import timeit

outputFileCounting = "count,time\n"
outputFileRadix = "count,time\n"

# Counting Sort
for x in range(10, 1001, 10):
    setup_counting = """
from sorters import counting_sort
import cPickle
with open("ri_0-1000k_{0}k.pickle", mode="rb") as f:
    integerList = cPickle.load(f)
        """.format(x)

    time_counting = timeit.timeit("""counting_sort.sortIntegerList(integerList)""",
                                setup = setup_counting, number=10)

    outputFileCounting += "{0},{1}\n".format(str(x*1000), time_counting/10)

    with open("sort_integer_counting_results.csv", mode="w") as f:
        f.write(outputFileCounting)

# Radix Sort
for x in range(10, 1001, 10):
    setup_radix = """
from sorters import radix_sort
import cPickle
with open("ri_0-1000k_{0}k.pickle", mode="rb") as f:
    integerList = cPickle.load(f)
        """.format(x)

    time_radix = timeit.timeit("""radix_sort.sortIntegerList(integerList, 10)""",
                                setup = setup_radix, number=10)

    outputFileRadix += "{0},{1}\n".format(str(x*1000), time_radix/10)

    with open("sort_integer_radix_results.csv", mode="w") as f:
        f.write(outputFileRadix)

Each integer set is stored as a list in pickle file.

gturri
  • 13,807
  • 9
  • 40
  • 57
jirinovo
  • 2,105
  • 3
  • 26
  • 37

1 Answers1

0

Your radix sort does a lot of allocating and reallocating of memory as it goes. I wonder if, perhaps, that is the issue. What if you only allocated memory once for your data structures, and accepted the fact that you would need to over allocate.

Other than that, have you checked to be certain that the final lists are truly sorted? Have you looked at other statistics for your radix sort (i.e. min/max/median) times, maybe there are occasional outliers, investigating them could help you explain things.

user1245262
  • 6,968
  • 8
  • 50
  • 77
  • Yes, not in code up, but I have checked the final lists with `sortedList == sorted(unsortedList)` and algorithm works reliably. Well I will look at `timeit` module more accurately, because I read several types of usage ([for example here for sorting algorithm](http://stackoverflow.com/a/8220943/1928742)) – jirinovo Dec 28 '14 at 16:13
  • @jirinovo Try rewriting with data structures that aren't allocated as your program runs. With all the appending and extending in your code, I wouldn't be surprised if there's a reasonable chance of triggering garbage collection, swapping to disk or other memory management processes. – user1245262 Dec 28 '14 at 16:34