Question
How to implement a run length encoding modulus n>=1
? For n=4
, considering the inputAAABBBBABCCCCBBBDAAA
, we want an output of [('D', 1), ('A', 3)]
. Note the long-distance merging due to the modulus operation. See Explanation.
Explanation
The first occurance of BBBB
encodes to (B, 4)
whose modulus 4
is (B, 0)
, thus canceling itself out. See the diagram (ignore spaces; they are simply for illustrative purposes):
AAABBBBABCCCCBBBDAAA
A3 B4 ABCCCCBBBDAAA
A3 B0 ABCCCCBBBDAAA
A3 ABCCCCBBBDAAA
A4 BCCCCBBBDAAA
A0 BCCCCBBBDAAA
BCCCCBBBDAAA
...
DA3
A simpler example when no merging happens since none gets canceled by modulus 4
: input AAABABBBC
produces output [('A',3),('B',1),('A',1),('B',3),('C',1)]
.
Requirements
- Haskell implementations are preferred but others are welcome too!
- Prefer standard/common library functions over 3rd party libraries.
- Prefer readable and succint programs utilizing higher-order functions.
- Prefer efficiency (do not loop over the whole list whenever unnecessary)
My program
I implemented this in Haskell, but it looks too verbose and awful to read. The key idea is to check three tuples at a time, and only advance one tuple forward if we can neither cancel out 0
tuples nor merge a pair of tuples among the three tuples at hand.
import Data.List (group)
test = [('A', 1), ('A', 2), ('B', 2), ('B', 2), ('A', 1), ('B', 1), ('C', 1), ('C', 3), ('B', 3), ('D', 1), ('A', 3)] :: [(Char, Int)]
expected = [('D', 1), ('A', 3)] :: [(Char, Int)]
reduce' :: [(Char, Int)] -> [(Char, Int)]
reduce' [ ] = [] -- exit
reduce' ( (_,0):xs) = reduce' xs
reduce' (x1:(_,0):xs) = reduce' (x1:xs)
reduce' ( (x,n):[]) = (x,n):[] -- exit
reduce' ( (x1,n1):(x2,n2):[]) -- [previous,current,NONE]
| x1 == x2 = reduce' ((x1, d4 (n1+n2)):[])
| otherwise = (x1,n1):( -- advance
reduce' ((x2, d4 n2 ):[]))
reduce' ((x1,n1):(x2,n2):(x3,n3):xs) -- [previous,current,next]
| n3 == 0 = reduce' ((x1, d4 n1 ):(x2, d4 n2 ):xs)
| n2 == 0 = reduce' ((x1, d4 n1 ):(x3, d4 n3 ):xs)
| x2 == x3 = reduce' ((x1, d4 n1 ):(x2, d4 (n2+n3)):xs)
| x1 == x2 = reduce' ((x2, d4 (n1+n2)):(x3, d4 n3 ):xs)
| otherwise = (x1,n1):( -- advance
reduce' ((x2, d4 n2 ):(x3, d4 n3 ):xs)
)
-- Helpers
flatten :: [(Char, Int)] -> String
flatten nested = concat $ (\(x, n) -> replicate n x) <$> nested
nest :: String -> [(Char, Int)]
nest flat = zip (head <$> xg) (d4 .length <$> xg)
where xg = group flat
reduce = reduce' . nest
d4 = (`rem` 4)
Thoughts
My inputs are like the test
variable in the snipped above. We could keep doing flatten
then nest
until its result doesn't change, and would definitely look simpler. But it feels it is scanning the whole list many times, while my 3-pointer implementation scans the whole list only once. Maybe we can pop an element from left and add it to a new stack while merging identical consecutive items? Or maybe use Applicative Functors? E.g. this works but not sure about its efficiency/performance: reduce = (until =<< ((==) =<<)) (nest . flatten)
.