Is the Benchmarking of my Algorithms right?

Question

i wrote Quicksort and Mergesort and a Benchmark for them, to see how fast they are.

Here is my code:

#------------------------------Creating a Random List-----------------------------#



def create_random_list (length):

    import random

    random_list = list(range(0,length))
    random.shuffle(random_list)

    return random_list

# Initialize default list length to 0
random_list = create_random_list(0)

# Testing random list function
print ("\n" + "That is a randomized list: " + "\n")
print (random_list)
print ("\n")



#-------------------------------------Quicksort-----------------------------------#



"""

Recursive Divide and Conquer Algorithm

+ Very efficient for large data set
- Performance Depends largely on Pivot Selection

Time Complexity
   --> Worst-Case -----> O (n^2)
   --> Best-Case  -----> Ω (n log (n)) 
   --> Average Case ---> O (n log (n))

Space Complexity
   --> O(log(n))

"""

# Writing the Quick Sort Algorithm for sorting the list - Recursive Method
def qsort (random_list):


    less = []
    equal = []
    greater = []

    if len(random_list)>1:

        # Initialize starting Point
        pivot = random_list[0]

        for x in random_list:

            if x < pivot:
                less.append(x)

            elif x == pivot:
                equal.append(x)

            elif x > pivot:
                greater.append(x)

        return qsort(less) + equal + qsort(greater)

    else:
        return random_list

"""
Build in Python Quick Sort:

def qsort(L):
    if len(L) <= 1: return L
    return qsort([lt for lt in L[1:] if lt < L[0]]) + L[0:1] + \
           qsort([ge for ge in L[1:] if ge >= L[0]])

"""

# Calling Quicksort
sorted_list_qsort = qsort(random_list)

# Testint Quicksort
print ("That is a sorted list with Quicksort: " + "\n")
print (sorted_list_qsort)
print ("\n")



#-------------------------------------FINISHED-------------------------------------#






#-------------------------------------Mergesort------------------------------------#



"""

Recursive Divide and Conquer Algorithm

+ 
- 

Time Complexity
   --> Worst-Case -----> O (n l(n))
   --> Best-Case  -----> Ω (n l(n))  
   --> Average Case ---> O (n l(n))

Space Complexity
   --> O (n)

"""

# Create a merge algorithm
def merge(a,b): # Let a and b be two arrays

    c = [] # Final sorted output array

    a_idx, b_idx = 0,0 # Index or start from a and b array

    while a_idx < len(a) and b_idx < len(b):

        if a[a_idx] < b[b_idx]:
            c.append(a[a_idx])
            a_idx+=1

        else:
            c.append(b[b_idx])
            b_idx+=1

    if a_idx == len(a): c.extend(b[b_idx:])
    else:               c.extend(a[a_idx:])
    return c

# Create final Mergesort algorithm
def merge_sort(a):

    # A list of zero or one elements is sorted by definition
    if len(a)<=1:

        return a

    # Split the list in half and call Mergesort recursively on each half
    left, right = merge_sort(a[:int(len(a)/2)]), merge_sort(a[int(len(a)/2):])

    # Merge the now-sorted sublists with the merge function which sorts two lists
    return merge(left,right)

# Calling Mergesort
sorted_list_mgsort = merge_sort(random_list)

# Testing Mergesort
print ("That is a sorted list with Mergesort: " + "\n")
print (sorted_list_mgsort)
print ("\n")



#-------------------------------------FINISHED-------------------------------------#





#------------------------------Algorithm Benchmarking------------------------------#



# Creating an array for iterations
n = [100,1000,10000,100000]

# Creating a dictionary for times of algorithms
times = {"Quicksort":[], "Mergesort": []}

# Import time for analyzing the running time of the algorithms 
from time import time

# Create a for loop which loop through the arrays of length n and analyse their times
for size in range(len(n)):

    random_list = create_random_list(n[size])
    t0 = time()
    qsort(random_list)
    t1 = time()
    times["Quicksort"].append(t1-t0)

    random_list = create_random_list(n[size-1])
    t0 = time()
    merge_sort(random_list)
    t1 = time()
    times["Mergesort"].append(t1-t0)

# Create a table while shows the Benchmarking of the algorithms
print ("n\tMerge\tQuick")
print ("_"*25)

for i,j in enumerate(n):
    print ("{}\t{:.5f}\t{:.5f}\t".format(j, times["Mergesort"][i], times["Quicksort"][i]))



#----------------------------------End of Benchmarking---------------------------------#

The code is well documented and runs perfectly with Python 3.8. You may copy it in a code editor for better readability.

--> My Question as the title states:

Is my Benchmarking right? I'm doubting it a litte bit, because the running times of my Algorithms seem a little odd. Can someone confirm my runtime?

--> Here is the output of this code:

That is a randomized list: 

[]


That is a sorted list with Quicksort: 

[]


That is a sorted list with Mergesort: 

[]


n       Merge   Quick
_________________________
100     0.98026 0.00021 
1000    0.00042 0.00262 
10000   0.00555 0.03164 
100000  0.07919 0.44718

--> If someone has another/better code snippet on how to print the table - feel free to share it with me.

In your argument to `create_random_list` you do `n[size]` with one algorithm but `n[size-1]` with the other - mistake? -- The sample size of 100 probably has too much static overhead to be useful. — 500 - Internal Server Error, Apr 21 '20 at 10:57
If the goal is to compare algorithms, a compiled language might be better than python, which can be 50 or so times as slow as a C / C++ version of the same algorithm. The merge sort code is creating and copying sub-arrays, which can be avoided with a one time creation of a working array. The copying can be reduced to a one time copy and alternating direction of merge based on level of recursion, as shown in the [wiki article](https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation). The one time copy can be avoided as well, but requires a bit more code. — rcgldr, Apr 21 '20 at 16:54

score 1 · Answer 1 · answered Apr 21 '20 at 12:51

The error is in n[size-1]: when size is 0 (the first iteration), this translates to n[-1], which corresponds to your largest size. So in the first iteration you are comparing qsort(100) with merge_sort(100000), which obviously will favour the first a lot. It doesn't help that you call this variable size, as it really isn't the size, but the index in the n list, which contains the sizes.

So remove the -1, or even better: iterate directly over n. And I would also make sure both sorting algorithms get to sort the same list:

for size in n:
    random_list1 = create_random_list(size)
    random_list2 = random_list1[:]

    t0 = time()
    qsort(random_list1)
    t1 = time()
    times["Quicksort"].append(t1-t0)

    t0 = time()
    merge_sort(random_list2)
    t1 = time()
    times["Mergesort"].append(t1-t0)

Finally, consider using timeit which is designed for measuring performance.

Thanks! There was my mistake the size-1. And thank you for you're tips! Will implement them! — philuix, Apr 22 '20 at 08:11

Is the Benchmarking of my Algorithms right?

1 Answers1