2

Obviously, this is just an example I'm using to understand Haskell. I'm not planning to use bubble sort for sorting, nor would I normally care to optimize away a single iteration through a list in a quadratic algorithm.

I was hoping to stop bubble sort when the list is already sorted. I wrote this code where bubble returns, as a second item in the tuple, the boolean status that tells the caller whether any swaps were made:

bubble (x:y:ys)
    | x > y = (y : fst (bubble (x:ys)), True)
    | otherwise = (x : fst (bubble (y:ys)), snd (bubble (y:ys)))
bubble x = (x, False)
bubblesort xs
    | snd (bubble xs) = bubblesort (fst (bubble xs))
    | otherwise = fst (bubble xs)

In languages I'm familiar with, it would be horribly inefficient because bubble (y:ys) and bubble xs would be recomputed twice without explicit memoization. But given that Haskell functions don't have side effects, can I assume that the compiler will optimize the duplicate calls away, making my code efficient?

If not, what's a good way to write efficient bubble sort?

I have seen the approach which checks if the list is sorted:

bubbleSort (x:y:xs) = if sorted thisSort then thisSort else bubbleSort thisSort
    where thisSort = (min x y) : bubbleSort ((max x y):xs)
bubbleSort x = x
sorted (x:y:xs) = if x <= y then sorted (y:xs) else False
sorted x = x

But I am trying to avoid the one additional iteration through the list (once bubbleSort performed no swaps, one more list traversal to verify that the list is sorted is unnecessary).

max
  • 49,282
  • 56
  • 208
  • 355
  • "Efficient bubble sort" : - there's no such thing!! it's O(N^2) (as you know) – Mitch Wheat Mar 24 '17 at 02:40
  • @MitchWheat Yes, I meant in the narrow sense of avoiding extra iterations. I just randomly ended up with this silly example. – max Mar 24 '17 at 02:43
  • 1
    O(N^2) would be quite efficient indeed, compared to the exponential time spent by the implementation posted in this question! – amalloy Mar 24 '17 at 02:55

2 Answers2

6

In languages I'm familiar with, it would be horribly inefficient because bubble (y:ys) and bubble xs would be recomputed twice without explicit memoization. But given that Haskell functions don't have side effects, can I assume that the compiler will optimize the duplicate calls away, making my code efficient?

No, you can't assume that. This optimisation, called common subexpression elimination, can be a rather tricky trade-off, as you are exchanging time for space. For instance, if the duplicated expression is something that would eat up a lot of memory, it might be preferrable to compute it twice so that you can consume it immediately, rather than computing it just once and leaving it around in memory. For that reason, GHC is pretty conservative when it comes to applying common subexpression elimination. See this question about an example rather similar to yours, and this one for in-depth discussion.

(In your case, of course, not wanting to compute bubble (y:ys) twice is entirely reasonable. You can sidestep the whole issue by adding an auxiliary definition, as Lazersmoke's answer illustrates.)

Community
  • 1
  • 1
duplode
  • 33,731
  • 7
  • 79
  • 150
3

Use a where clause:

bubble (x:y:ys)
  | x > y = (y : fst (bubble (x:ys)), True)
  | otherwise = (x : fst yys, snd yys)
  where
    yys = bubble (y:ys)
bubble x = (x, False)
bubblesort xs
  | snd bxs = bubblesort (fst bxs)
  | otherwise = fst bxs
  where
    bxs = bubble xs

GHC will probably figure out how to optimize it to this anyway, but this makes it much more likely since it gives an explicit binding to memoize, rather than having to find duplicated code.

Lazersmoke
  • 1,741
  • 10
  • 16
  • 3
    If I had to make a guess, I would say GHC won't deduplicate the `bubble (y:ys)` in the question, for the reasons I mention in my answer. (But then again, I didn't actually look at what Core comes out of the OP's code...) – duplode Mar 24 '17 at 02:45