String combination in Haskell

Question

I'm writing an algorithm in Haskell that simplifies context-free grammars and I've been struggling with the removal of null productions, more specifically with the "substitution" of nullable non-terminals in the other productions.

Given a string, let's say "ASA", I would like to return a list of strings built by removing the character "A" one, two, ... every time it appears. To be clear, given "ASA" I'd like to return this: ["SA", "AS", "S"].

In Python I did it quite easily, but in Haskell I don't know how to iterate over the string and manipulate it how I'd like to. Probably because I'm still not used tu functional programming.

The usual way to write this in Haskell would be a recursive definition using cases for non-empty and empty strings: `removeAs (x:xs) = ...; removeAs "" = ...`. Have you tried anything like that? If so, can you share your code? — K. A. Buhr, Aug 15 '20 at 16:42
As an alternative to using handcrafted recursions, you can also try to make it by combining library functions. That somehow forces you to learn your way thru the runtime library. Say you could define `expand ch = if (ch == 'A') then [ Just ch, Nothing ] else [ Just ch ]` and then work from `sequence (map expand "ASA")`. Function [sequence](https://hoogle.haskell.org/?hoogle=sequence) is the list cartesian product. Please note things are much easier for us if you provide at the very least for your function a tentative name, a type signature and argument names. — jpmarinier, Aug 16 '20 at 09:09

score 1 · Answer 1 · answered Aug 17 '20 at 14:09

A library-based approach:

A given input character may or may not be in any of the output partial strings. So it seems natural to involve the Haskell Maybe type transformer. It is similar to std::optional in C++.

We can have an expand function that associates to each input character a list of the corresponding possibilities:

$ ghci
 λ> 
 λ> st = "ASA"
 λ> 
 λ> expand ch = if (ch == 'A') then [ Just ch, Nothing ] else [ Just ch ]
 λ> 
 λ> map expand st
 [[Just 'A',Nothing],[Just 'S'],[Just 'A',Nothing]]
 λ>

What we need is basically the Cartesian product of the above lists of possibilites. A list Cartesian product can be obtained by using the highly polymorphic sequence library function:

 λ> 
 λ> sequence (map expand st)
[[Just 'A',Just 'S',Just 'A'],[Just 'A',Just 'S',Nothing],[Nothing,Just 'S',Just 'A'],[Nothing,Just 'S',Nothing]]
 λ>

Next, we need to change for example [Just 'A', Just 'S', Nothing] into ['A', 'S'], which in Haskell is exactly the same thing as "AS". The required function would have as its type signature:

func :: [Maybe α] -> [α]

If we submit this candidate type signature into Hoogle, we readily get library function catMaybes:

 λ> 
 λ> import qualified Data.Maybe as Mb
 λ> 
 λ> Mb.catMaybes [Just 'A',Just 'S',Nothing]
 "AS"
 λ> 
 λ> map Mb.catMaybes (sequence (map expand st))
 ["ASA","AS","SA","S"]
 λ>

and we just have to remove the full string "ASA" from that last list.

Of course, there is no need to restrict this to the Char data type. Any type with a proper equality test can do. And the privileged character 'A' should be made into a variable argument. Overall, this gives us the following code:

import qualified Data.Maybe as Mb

multiSuppressor :: Eq α => α -> [α] -> [[α]]
multiSuppressor e xs =
    let  expand e1 = if (e1 == e) then [ Just e1, Nothing ] else [ Just e1 ]
         maybes  = sequence  (map expand xs)
         res1    = map  Mb.catMaybes  maybes
    in
         -- final massaging as the whole list is normally unwanted:
         if (null xs)  then  [[]]  else  filter (/= xs) res1

A note on efficiency:

Function sequence is polymorphic. Being the list cartesian product is not its sole role in life. Unfortunately, this happens to have the sad side effect that its memory consumption can become quite large if you go beyond toy-sized examples.

If this becomes a problem, one can use the following replacement code instead, which is based on an idea by K. A. Buhr:

cartesianProduct :: [[α]] -> [[α]]
cartesianProduct xss =
    map reverse (helper (reverse xss))
        where
            helper [] = [[]]
            helper (ys:zss) =  [y:zs | zs <- helper zss, y <- ys]

`if (e1 == e) then [ Just e1, Nothing ] else [ Just e1 ]` can also be written a bit more compactly as `Just e1 : [Nothing | e1 == e]`, and `sequence (map expand xs)` is a specialised version of `sequenceA (fmap expand xs)`, which is equal to `traverse expand xs` — Jon Purdy, Aug 17 '20 at 23:12
You could put it like this, using `Data.Functor.Reverse`: `traverse2 f xs = fmap (reverse . getReverse) $ traverse f (Reverse (reverse xs))`. I think this does a pretty good job of explaining that the traversal is in the same order, only reassociated — dfeuer, Oct 29 '20 at 17:50
@dfeuer - Thanks a lot for the info - I might have to dive into it a little while before making full sense of it. — jpmarinier, Oct 29 '20 at 20:34
For any *lawful* `Applicative` `g`, any *finite* `xs :: [a]`, and any function `f :: a -> f b`, `traverse2 f xs = traverse f xs`. Which is better depends on the `Applicative` and perhaps other circumstances. — dfeuer, Oct 29 '20 at 21:13
Think of `Reverse . reverse` as converting a list to a snoc list: `1:(2:(3:[])) -> ((Nil :> 1) :> 2) :> 3`. — dfeuer, Oct 29 '20 at 21:19

String combination in Haskell

1 Answers1

A library-based approach:

A note on efficiency:

Linked