0

I have a task of finding the steps required to arrange an array of string into a lexicographical order.

After some research, I discovered that the natsort() does the same thing. But I am unable to find its definition nor am I able to find any function that can give me the number of steps required in natsort().

Eg, I have an array with following values

  • Oksana Baiul
  • Michelle Kwan

The number of steps required to sort this array in a lexicographical manner is 1.

So I am interested in getting the number of steps directly rather than implementing any sorting algorithm.

Any help is appreciated.!

Tirthraj Barot
  • 2,671
  • 2
  • 17
  • 33

1 Answers1

3

For counting the comparison steps:

natsort uses a specific compare function during the sorting, which can be called separately as well: strnatcmp.

This means that this:

natsort($data);

... will produce the same result as:

usort($data, 'strnatcmp');

Or, more verbose:

usort($data, function ($a, $b) {
    return strnatcmp($a, $b);
});

That opens the door to counting the number of times this comparison function is called:

$count = 0;
usort($data, function ($a, $b) use (&$count) {
    $count++;
    return strnatcmp($a, $b);
});

For example:

$data = ['test', 'abc10', 'abc2', 'OK', 9, 13];

$count = 0;
usort($data, function ($a, $b) use (&$count){
    $count++;
    return strnatcmp($a, $b);
});

echo "$count comparisons: " . implode(", ", $data);

Outputs:

8 comparisons: 9, 13, OK, abc2, abc10, test

Note that this sorting algorithm is case sensitive: "OK" is sorted before "abc".

For a case insensitive natural sort you can use natcasesort and strnatcasecmp

For counting the swaps

It is not possible to use above method to know how many swaps are made during the sort. PHP uses a version of QuickSort internally, so you could mimic the same kind of algorithm and count the swaps yourself. This will obviously be an estimation, for the following reasons:

  • There are different flavors of Quick Sort, in terms of how they chose the pivot at each recursive step, so this influences the number of swaps they do for a particular input;
  • Some flavors do not swap at all, but create a new array all together and then at the end overwrite the original array with it -- see for instance this algorithm in PHP;
  • Still other flavors do perform swaps, but pick the pivot (in the algorithm) randomly, so the number of swaps will be different on the same inputs;
  • Many implementations will switch to another sorting algorithm when the array has only a few elements, which again influences the number of swaps.

I will provide here the code for what I believe is the most standard algorithm: it chooses one pivot index, right at the middle of the given partition:

class QuickSort {
    private static $swaps = 0;
    private static $cmp = 'test';

    private static function swap(&$array, $i, $j) {
        if ($i == $j) return;
        self::$swaps++;
        list($array[$i], $array[$j]) = [$array[$j], $array[$i]];
    }

    private static function partition(&$array, $begin, $end) {
        $cmp = self::$cmp;
        $index = floor(($begin + $end) / 2);
        $pivot = $array[$index];
        self::swap($array, $index, $end); 
        for ($i = $index = $begin; $i < $end; $i++) {
            if ($cmp($array[$i], $pivot) >= 0) continue;
            self::swap($array, $index++, $i);
        }
        self::swap($array, $index, $end); 
        return $index;
    }

    private static function qsort(&$array, $begin, $end) {
        if ($end <= $begin) return;
        $index = self::partition($array, $begin, $end);
        self::qsort($array, $begin, $index - 1);
        self::qsort($array, $index + 1,  $end);
    }

    public static function sort(&$array, $cmp = 'strcmp') {
        self::$swaps = 0;
        self::$cmp = $cmp;
        self::qsort($array, 0, count($array) - 1);
        return self::$swaps;
    }
}

// Sample input
$data = [1,3,2,8,2,5,4,6];
// Call it: this sort function returns the number of swaps made
$swaps = QuickSort::sort($data, 'strnatcmp');
// Display the result:
echo "$swaps swaps: " . implode(", ", $data);

Outputs:

9 swaps: 1, 2, 2, 3, 4, 5, 6, 8

Again, the number of swaps might be more than you expect in some cases, as Quick Sort is not specifically designed to minimise the number of swaps.

Community
  • 1
  • 1
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Thank you.. But something seems wrong. if I try to sort -Elvis Stojko - Evgeni Plushenko - Kristi Yamaguchi, Then it should return 0 instead it returns 2. – Tirthraj Barot Oct 31 '16 at 09:28
  • There are two things you can count: the number of comparisons, or the number of swaps. This counts the number of comparisons, which really determines the time complexity of an algorithm. Are you looking to count the number of swaps? – trincot Oct 31 '16 at 09:47
  • Yes... and I tried this usort($data, function ($a, $b) use (&$count) { $x = strnatcmp($a, $b); if($x == 1) $count++; return $x; }); – Tirthraj Barot Oct 31 '16 at 09:50
  • I think you cannot derive it that way, because we have no knowledge on what the order is of `$a` and `$b` in the partially sorted array when the call is made. – trincot Oct 31 '16 at 09:54
  • True.. Then how do I get the number of swaps? – Tirthraj Barot Oct 31 '16 at 09:55
  • As [PHP uses Quicksort](http://stackoverflow.com/questions/3165984/what-sort-algorithm-does-php-use) internally, you'd have to write a [quicksort algorithm](https://en.wikibooks.org/wiki/Algorithm_Implementation/Sorting/Quicksort#PHP) which uses `strnatcmp` as comparison function, and count the swaps you do. – trincot Oct 31 '16 at 10:33
  • I would really appreciate if you could update your answer with the quick sort algorithm implementation – Tirthraj Barot Oct 31 '16 at 10:34
  • I added a section about counting swaps. – trincot Oct 31 '16 at 14:51