1

I am writing a demo class in Java to analyze the following sorting algorithms:

  1. InsertionSort
  2. SelectionSort
  3. BubbleSort
  4. MergeSort
  5. QuickSort

which I have implemented as static methods in another class named Sort.

I want to compare the Best-, Average- and Worst-Cases of each algorithm by determining the runtime with the analytical komplexity using the omicron formula.

In the demo class, I only want to determine the time (in nanoseconds) each algorithm needs to sort an Integer Array with different lengths in the Best-, Average- and Worst-Case order of numbers in the Array.

        //Best-Case
    int[] arrbc0 = {1};
    int[] arrbc1 = {1, 2};
    int[] arrbc2 = {1, 2, 3};
    int[] arrbc3 = {1, 2, 3, 4, 5};
    int[] arrbc4 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int[] arrbc5 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};

    //Average-Case
    int[] arrac1 = {1, 2};
    int[] arrac2 = {3, 1, 2};
    int[] arrac3 = {4, 2, 3, 1, 5};
    int[] arrac4 = {9, 1, 10, 6, 2, 4, 8, 3, 7, 5};
    int[] arrac5 = {13, 12, 1, 15, 5, 6, 7, 2, 14, 10, 3, 8, 4, 9, 11};

    //Worst-Case
    int[] arrwc1 = {2, 1};
    int[] arrwc2 = {3, 2, 1};
    int[] arrwc3 = {5, 4, 3, 2, 1};
    int[] arrwc4 = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
    int[] arrwc5 = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1};

    //InsertionSort:
    isNanoTime(arrbc0); //first load
    isNanoTime(arrbc1);
    isNanoTime(arrbc2);
    //...

    public static void isNanoTime(int[] arr) {
    long a1 = System.nanoTime();
    Sort.insertionSort(arr);
    long a2 = System.nanoTime() - a1;
    System.out.println(a2);
    }

Now I have some questions:

  1. Can I use these Arrays for all Best-, Average- and Worst-Cases of these Algorithms, or has f.e. the Worst-Case of MergeSort another order?!
  2. Is there an easy way to unsort the arrays, after sorting them once?
  3. Is this anyway the "right way" to determine the time-complexity (maybe someone has a better idea)?
monty.py
  • 2,511
  • 1
  • 17
  • 29
  • It can (sort of) demonstrate time complexity, but I don't think it can determine it. Execution time and time complexity are related, but different animals. – hatchet - done with SOverflow Jan 02 '15 at 17:13
  • 1
    This is going to be a tough one for many reasons, not least of which is http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – NPE Jan 02 '15 at 17:22

3 Answers3

1
  1. Your arrays are way too short: it will take almost no time for any "modern" CPU to sort them, even in your worst case
  2. To have pertinent time variations based on the shuffleness of the input, you need to set an input size that is fixed and that gives you measurable times (probably in order of seconds)
  3. You probably need to generate a set of thousands of random arrays, add maybe some specific array to this set (sorted, reversed sorted, ...). Then you can run each algorithm on each array from this set and measure the time needed to sort them. Doing so you can obtain a nice distribution graph for each algorithm on which you can see the behavior of each algorithm (bubble sort goes very high while heapsort is pretty stable ...). The worst input for one algorithm is not necessarily the same for an other algorithm, hence the set.
Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
0
  1. Such arrays can demonstrate worst and best cases for InsertionSort and BubbleSort. Typical implementations of MergeSort and SelectionSort have the same complexity for all arrays. The worst case for the simplest implementation of QuickSort is sorted (or back-sorted) array.
    Wiki page with useful table
    Note that these arrays are too short to notice any difference in run times. Make arrays with 10^3-10^6 elements (for slow and fast algorithms respectively).

  2. Look at Fisher-Yates shuffle to get random sequence

MBo
  • 77,366
  • 5
  • 53
  • 86
0

@MBo @Jean Logeart

What do you think about that:

//Main:
for(int n = 100_000; n <= 1_000_000; n = n + 100_000) {
    //f.e. average case of insertion sort:
    int[] arr = randomArray(n);
    insertionSortWithRuntime(arr);
}

/**
 * For best cases using sorted numbers.
 * @param n- the length in which the array should be created.
 * @return
 */
public static int[] sortedArray(int n) {
    int[] arr = new int[n];

    for (int i = 0; i < n; i++) {
        arr[i] = i;
    }
    return arr;
}

/**
 * For average cases using random numbers.
 * @param n - the length in which the array should be created.
 * @return
 */
public static int[] randomArray(int n) {
    int[] arr = new int[n];

    for (int i = 0; i < n; i++) {
        arr[i] = (int) (Math.random() * 9 + 1);
    }
    return arr;
}

/**
 * For worst cases using reversed sorted numbers.
 * @param n - the length in which the array should be created.
 * @return
 */
public static int[] reversedSortedArray(int n) {
    int[] arr = new int[n];

    int length = n - 1;

    for (int i = 0; i < n; i++) {
        arr[i] = length;
        length--;
    }
    return arr;
}

Have you imagined it this way?

monty.py
  • 2,511
  • 1
  • 17
  • 29
  • I proposed too big size for quadratic algorithms (insertion/bubble sorts). Size 1000-10000 would be reasonable. – MBo Jan 02 '15 at 20:27
  • Ok, i use this now: for(int n = 10_000; n <= 100_000; n = n + 10_000) {...} I have another question: Now I want to compare the empirical with the analytical data by transfer the data manually into Excel (and show a graph) f.e. for insertion sort Average- and Worst-Case, the big O notation is: O(n²) so, at an array with a length of 10.000 you expect a time of 100.000.000 million (which unit?!) and I get f.e. 93 Milliseconds?! I am a little bit confused now.. @Jean Logeart – monty.py Jan 02 '15 at 20:50
  • for n = 10000, 20000, 30000, 40000 you will get times like 100 ms, 400 ms, 900 ms, 1600 ms etc - this is is quadratic dependence, and graph (Time vs N) will look like parabola – MBo Jan 02 '15 at 21:05
  • But how do you know that 10.000 fields in an array are an input of 10 (millisecs) into the big O notation? Then for O(n) it must be 10, 20, 30, but sth like 100, 200, 300.. would fit better here – monty.py Jan 02 '15 at 21:09
  • Also it´s curious that QuickSort doesn´t need more than 62 millis even at arrays with 10.000 - 1.000.000 length?! – monty.py Jan 02 '15 at 21:48
  • You cannot measure big O of algorithm implementation with one input size. To evaluate (not measure exactly) big O, you need timings for series of different input size. I'd recommend Sedgewick's book Algorithms in Java – MBo Jan 03 '15 at 08:21
  • I know, the big O only represents the strongest components.. for better evaluation you need sth like c*g(n)+k.. but it´s acceptable for me to just use big O with one input size, n. I only want to know, how to convert the arraysize into n? For some cases you do well with 10.000 = 10 at other it looks better with 10.000 = 100?! Maybe there is a table or sth that say, that you need 10 millisecs for 1 field or sth?! I hope you understand what I mean. – monty.py Jan 03 '15 at 16:15