Determine what sorting algorithms are being used by a .jar?

Question

How does one go about determining the different types of sorting algorithms without being able to look at code?

The scenario is that our lecturer showed a .jar file with 4 sorting methods in it (sortA, sortB...) that we can call to sort a vector.

So, as far as I know, we can only measure and compare the times for the sorts. What's the best method to determine the sorts?

The main issue is that the times for the sorts don't take very long (differing by ~1000ms) to begin with, so comparing them by that isn't really on option, and so far all the data sets I've used (ascending, descending, nearly sorted) haven't really been giving much variation in the sort time.

"dont take very long" - then increase the vector size. You should start noticing differences soon. — Amadan, Mar 19 '15 at 02:07
Open the Jar file, look at the code and pretend to be a magician: http://stackoverflow.com/questions/5107187/extract-source-code-from-jar-file — Jeroen Vannevel, Mar 19 '15 at 02:07
Measuring the time might not be your only option for this... do any of the sorting methods also take a `Comparator`? Also, you should start looking for pathological data-sets. — Elliott Frisch, Mar 19 '15 at 02:07
@JeroenVannevel: this is in an academic context, so what you are suggesting would likely be considered to be cheating. — Amadan, Mar 19 '15 at 02:08
You should start by using a much larger data set and observe all the times. Then *double* the data set and observe the new times. Repeat and average. You should be able to learn quite a lot from how the times vary with *N.* Specifically, you will be able to weed out the *O(N^2)* algorithms such as bubble sort. — user207421, Mar 19 '15 at 02:57
@Amadan : With a 10000 integer vector you still only get the following 4 values (milliseconds) 2683, 1235, 1, 27. with the third and fourth values are always very small — Firo Proncho, Mar 19 '15 at 07:01
@JeroenVannevel : That defeats the purpose of the exercise, we arnt getting marked its just purely academic — Firo Proncho, Mar 19 '15 at 07:04
You seem to be under the impression that 10000 is a large array. If 10000 is fast, try 100000, 1000000, 10000000... And test on each algorithm separately: in order to get useful data for some of them, you will have to test on dataset sizes that would make other algorithms work for more time than you'll be willing to wait. — Amadan, Mar 19 '15 at 07:40

hfontanez · Answer 1 · 2015-03-19T02:47:24.560

I would create a data structure and do the following:

Sort the data on paper using known sorting algorithms and document the expected behavior on paper. Be very specific on what happens to your data after each pass.
Run a test on debug mode and step through the sorting process.
Observe how the elements are being sorted and compare it with your predictions (notes) obtained in step 1.

I think this should work. Basically, you are using the Scientific Method to determine which algorithm is being used in a given method. This way, you don't have to resort to "cheating" by decompiling code or rely in imprecise methodologies like using execution time. The three step process I outlined rely in solid, empirical data to arrive to your conclusions.

Determine what sorting algorithms are being used by a .jar?

1 Answers1