0

This is actually a follow-up of this thread: Perl: Numerical sort of arrays in a hash

I couldn't edit the original question because my current code is a bit different, so I'm just asking this as another question.

Okay after using the Schwarzian Transform, I have this:

my @mylines =("0.899 0.92 cat", 
            "9.999 0.001 dog",
            "-0.52 0.3 humpty", 
            "13.52 0.09 bumbo",
            "-1.52 0.98 nanny",
            "3.52 0.34 lala");

my @sorted = map { join ' ', @$_ }
             reverse sort { $a->[0] cmp $b->[0] or $a->[1] <=> $b->[1] }
             map { [ split ] } (@mylines);

print "$_\n" for @sorted;

I would expect the output to be sorted first by the first column, then the second, but it turns out like this:

9.999 0.001 dog
3.52 0.34 lala
13.52 0.09 bumbo
0.899 0.92 cat
-1.52 0.98 nanny
-0.52 0.3 humpty

I suppose that's because it's doing a character sort... but I need it numerically sorted. Do I need to put a java-like "parseInt()" somewhere in the sort function?

Community
  • 1
  • 1
user961627
  • 12,379
  • 42
  • 136
  • 210

1 Answers1

3

You are doing a cmp on the first column. That means it sorts it as text. Your sort function should be:

sort { $a->[0] <=> $b->[0] or $a->[1] <=> $b->[1] } ...

Which makes the output:

13.52 0.09 bumbo
9.999 0.001 dog
3.52 0.34 lala
0.899 0.92 cat
-0.52 0.3 humpty
-1.52 0.98 nanny

However, you should never need to put a reverse before sort, because you can always reverse the terms:

sort { $b->[0] <=> $a->[0] or $b->[1] <=> $a->[1] } ...
Axeman
  • 29,660
  • 2
  • 47
  • 102
  • One could also argue that it's better to always use `reverse` since that makes it more obvious that you're doing a reverse sort. Plus, using `reverse` means you don't have to specify an expression to sort with in many cases. – jamessan Oct 31 '11 at 13:07
  • @Jamessan - while readability is a very valid concern, in this case adding O(N) operations to reverse an array may be too high a cost to pay for it. Better to add a comment exlplaining that b<=>a is a reverse). – DVK Oct 31 '11 at 14:30
  • @DVK : [According to hobbs](http://stackoverflow.com/q/7585145/133939) the optimizer understands that `reverse sort { $a <=> $b}` is the equivalent to `sort { $a <=> $b }`. I guess that means no additional overhead is incurred – Zaid Oct 31 '11 at 16:30
  • @jamessan, Since each field is sorted individually, having one `reverse` at the front doesn't add readability. – ikegami Oct 31 '11 at 17:29
  • @DVK, Iif you're going to use Big-Oh analyis, O(N)+O(N log N) is still O(N log N). – ikegami Oct 31 '11 at 17:31
  • @Zaid, `sort { $a <=> $b }` and `reverse sort { $a <=> $b }` are indeed special cases, but neither is being used here. `reverse sort { f($a) <=> f($b) }` is optimised some too, but not to the same extent. (`sort` quickly swaps the order of the elements on the stack itself before returning instead of using a `reverse` op to do it.) Overhead *is* incurred. – ikegami Oct 31 '11 at 17:39
  • My earlier comment should've read "`reverse sort { $a <=> $b}` is equivalent to `sort { $b <=> $a}`" – Zaid Oct 31 '11 at 17:40
  • @Zaid, True, but my reply still applies. – ikegami Oct 31 '11 at 17:41
  • 1
    @ikegami : I'm curious as to where this behavior is documented. Never came across it in perldoc. Seems like a good question to ask. – Zaid Oct 31 '11 at 17:43
  • @Zaid, it's an optimisation, and those aren't documented. See OPpSORT_REVERSE in pp_sort.c – ikegami Oct 31 '11 at 17:44
  • @ikegami, that's why I said "in many cases" and not "in all cases". People should obviously evaluate what the trade-offs are in the code they're writing, such as profiling to see if the overhead of an explicit reverse really matters in the larger scheme of a specific code base. Neither answer is correct in all cases, which is what I was trying to show by going to the oppose extreme of "never use `reverse`." – jamessan Oct 31 '11 at 18:26
  • @Zaid - please ask or I'll steal the idea and ask :)))) Excellent point @ikegami! – DVK Oct 31 '11 at 19:33
  • @jamessan compare "never need to put a reverse" to "never use reverse". – Axeman Oct 31 '11 at 19:53
  • @jamessan, That's not true, you said it some argue it's *always* *better*: "One could also argue that it's better to always use `reverse` since that makes it more obvious that you're doing a reverse sort." The "in many cases" applied to something else you said. – ikegami Oct 31 '11 at 22:27
  • @jamessan, it seems to me that what I put in the sort block is specification of *how* I want it sorted. I think it's more confusing to say "... and take the reverse of that" rather than just specify it. I'd be more inclined to use `reverse sort @array`, knowing the base behavior. – Axeman Oct 31 '11 at 22:36