1

I'm trying to complete problem 8 of the Haskell Ninety-Nine problems however I'm having issues understanding why the list result of my function is ordered wrong.

The aim of the compress function is to eliminate any duplicate letters from the input list and output another list that contains the unique letters in the order in which they first appear in the input list. Here is my code for the compress function:

compress l = foldr f [] l where f a b = if a `elem` b then b else a : b

It functions correctly when the duplicate letters are adjacent to each other hence "aaaabbb" outputs "ab" which is correct however when a duplicate letter is separated by another letter it changes its order in the output hence "aba" outputs "ba" whereas the expected output is "ab".

Even when writing out the stack trace for foldr I seem to get the expected output however when running the code in the GHCI with an input such as "aba" or "abca" I get the incorrect result. What causes this behaviour? Why is it that when a duplicate letter is separated by a different letter the order of the output is changed?

Will Ness
  • 70,110
  • 9
  • 98
  • 181
Pzet
  • 470
  • 2
  • 4
  • 11
  • `b` is the result of the *tail* being folded, so ``a `elem` b`` is only false for the *last* occurrence of `a` in the string. With `"aaaabbb"`, it's still the last occurrence of each character being added to the result; the last occurrence just happens to be in the same "cluster" as the first. – chepner Feb 04 '22 at 15:01
  • An inefficient *left* fold that works would be `compress = foldl (\acc x -> if elem x acc then acc else acc ++ [x]) []`. See https://stackoverflow.com/questions/6172004/writing-foldl-using-foldr for converting `compress` from `foldl` to `foldr`. – chepner Feb 04 '22 at 15:05
  • @chepner, while that technique is the right technique, that code is not the right code. The thing to do here is to start by writing an efficient function (using a recursive helper) and then converting *that* to a (right) fold. `foldl` has the wrong shape to do this well. – dfeuer Feb 04 '22 at 16:26
  • 1
    Part of the reason I posted the "wrong" code is to avoid writing the correct one for the OP. It was just to demonstrate working on the right "end" of the list. – chepner Feb 04 '22 at 16:35
  • There is this [old SO answer](https://stackoverflow.com/a/3100764/11282404) about the elimination of duplicates in a list. And if the rules of the game ban the use of `Set` or you have no `Ord` instance, you can always use another list as the computation state instead of the Set. – jpmarinier Feb 05 '22 at 10:46

2 Answers2

4

The foldr function starts with an initial value (of the type b) and a Foldable container t. For 'each' a value in the container, it calls the function (a -> b -> b) with the a value and the 'current' b value.

Prelude> :t foldr
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

In compress, the initial value is [], which also enables the compiler to infer that the Foldable instance is [].

Now try out the steps in GHCi. Define (just for the GHCi session) f as a top-level function:

Prelude> f a b = if a `elem` b then b else a : b

If the input is "aba", the first time f is called, the b value will be [] and the a value will be 'a', because foldr folds from the right.

Prelude> f 'a' []
"a"

The return value "a" now becomes the accumulator value b for the next time around:

Prelude> f 'b' "a"
"ba"

This is because f conses 'b' onto "a".

The accumulator value is now "ba". Pass it to f again, using the third and final value in the list:

Prelude> f 'a' "ba"
"ba"

See e.g. this other answer that outlines an interactive way to explore and 'debug' Haskell functions.

Mark Seemann
  • 225,310
  • 48
  • 427
  • 736
  • It's funny. "Third and final" is definitely idiomatic, but in this case it might be a bit misleading. It's temporally final but semantically initial -- it's the first element of the list that's being folded! Especially the continuation "final value in the list" makes it seem like the semantic interpretation is intended. – Daniel Wagner Feb 04 '22 at 15:29
  • @DanielWagner You're right. I was thinking of the order of evaluation of the elements when I wrote that. – Mark Seemann Feb 04 '22 at 15:48
  • The `Foldable` instance is `[]`, not `[] Char`. I know what you mean by that statement, but the OP probably won't grok your shorthand. – dfeuer Feb 04 '22 at 16:23
  • @dfeuer Well, that I can edit (edited). – Mark Seemann Feb 04 '22 at 18:29
  • I see the logic here however how come when the input is changed to "ab" as opposed to "aba" the output is no longer "ba" but instead "ab"? Surely the same steps would follow meaning a would be added first then the cons operator would be applied with b leading to "ba" which should be the final output. – Pzet Feb 04 '22 at 19:35
  • 1
    @Pzet Try walking through the steps of `f` as shown above, but using `"ab"` instead of `"aba"` as the input. – Mark Seemann Feb 04 '22 at 21:26
  • @MarkSeemann Thank you, I see why this is happening now. – Pzet Feb 05 '22 at 16:21
2
compress l = foldr f [] l where f a b = if a `elem` b then b else a : b

... "aba" outputs "ba" whereas the expected output is "ab".

foldr on lists is very simple. It is defined so that

foldr g z [a,b,c,...,n]  =  g a (foldr g z [b,c,...,n])

-- and by generalization,
foldr g z [          n]  =  g n z

-- which means
foldr g z [           ]  =      z

That's really all there is to it. Looks self-explanatory, isn't it? Just re-write your foldr call, syntactically, to immediately see exactly what's going on. In particular, re-writing your definition a little bit, we have

compress xs  =  foldr f [] xs
    where
    f a r  =  if  elem a r  then      r 
                            else  a : r

(I do (try to) always call the second argument to the combining function "r", for the "Recursively calculated Result for the Rest of the input list".)

Thus we have

compress             [a,b,c,...,n]
   =  foldr f []     [a,b,c,...,n]  
   =  f a ( foldr f [] [b,c,...,n] )
   =  if  elem a r  then      r 
                    else  a : r
        where 
        r = foldr f [] [b,c,...,n]
   =  if  elem a r  then      r 
                    else  a : r
        where 
        r = compress   [b,c,...,n]

(using where as if it where a part of the expression, like a "flipped let"). In other words,

compress (x:xs) = if  elem x   r  
                    then       r
                    else   x : r
                  where        r = compress xs
compress []     = []

This reads: "if x is present in the compressed rest of input, skip it; else, keep it; and continue with the compressed rest". (so, calling it b is misleading; suggests a and b are similar entities; they are not -- a (or x) is an element of the list, and r is its transformed tail).

So you see where the problem is: if there are more then one as in the list, this definition keeps the last one, whereas you want to keep the first.

So if there are several of them in a row this difference is unobservable:

       -- 1.                    -- 2.
     a a a a b b b            a a a a b b b
           a     b            a       b

but if there's an intervening character then of course we can see the difference:

       -- 1.                    -- 2.
     a a a a b b b a          a a a a b b b a
                 b a          a       b
Will Ness
  • 70,110
  • 9
  • 98
  • 181