1

I have been trying to compress a String. Given a String like this:

AAABBCAADEEFF, I would need to compress it like 3A2B1C2A1D2E2F

I was able to come up with a tail recursive implementation:

    @scala.annotation.tailrec
    def compress(str: List[Char], current: Seq[Char], acc: Map[Int, String]): String = str match {
      case Nil =>
        if (current.nonEmpty)
          s"${acc.values.mkString("")}${current.length}${current.head}"
        else
          s"${acc.values.mkString("")}"
      case List(x) if current.contains(x) =>
        val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length + 1}${current.head}")
        compress(List.empty[Char], Seq.empty[Char], newMap)
      case x :: xs if current.isEmpty =>
        compress(xs, Seq(x), acc)
      case x :: xs if !current.contains(x) =>
        if (acc.nonEmpty) {
          val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length}${current.head}")
          compress(xs, Seq(x), newMap)
        } else {
          compress(xs, Seq(x), acc ++ Map(1 -> s"${current.length}${current.head}"))
        }
      case x :: xs =>
        compress(xs, current :+ x, acc)
    }

// Produces 2F3A2B1C2A instead of 3A2B1C2A1D2E2F
    compress("AAABBCAADEEFF".toList, Seq.empty[Char], Map.empty[Int, String]) 

It fails however for the given case! Not sure what edge scenario I'm missing! Any help?

So what I'm actually doing is, going over the sequence of characters, collecting identical ones into a new Sequence and as long as the new character in the original String input (the first param in the compress method) is found in the current (the second parameter in the compress method), I keep collecting it.

As soon as it is not the case, I empty the current sequence, count and push the collected elements into the Map! It fails for some edge cases that I'm not able to make out!

joesan
  • 13,963
  • 27
  • 95
  • 232

4 Answers4

4

I came up with this solution:

def compress(word: List[Char]): List[(Char, Int)] =
  word.map((_, 1)).foldRight(Nil: List[(Char, Int)])((e, acc) =>
  acc match {
    case Nil => List(e)
    case ((c, i)::rest) => if (c == e._1) (c, i + 1)::rest else e::acc
  })

Basically, it's a map followed by a right fold.

nicodp
  • 2,362
  • 1
  • 11
  • 20
3

Took inspiration from the @nicodp code

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

First our intermediate result will be List[(Char, Int)]. List of tuples of chars each char will be accompanied by its count.

Now lets start going through the list one char at once using the Great! foldLeft

We will accumulate the result in the acc variable and e represents the current element.

acc is of type List[(Char, Int)] and e is of type Char

Now when we start, we are at first char of the list. Right now the acc is empty list. So, we attach first tuple to the front of the list acc with count one.

when acc is Nil do (e, 1) :: Nil or (e, 1) :: acc note: acc is Nil

Now front of the list is the node we are interested in.

Lets go to the second element. Now acc has one element which is the first element with count one.

Now, we compare the current element with the front element of the list if it matches, increment the count and put the (element, incrementedCount) in the front of the list in place of old tuple.

if current element does not match the last element, that means we have new element. So, we attach new element with count 1 to the front of the list and so on.

then to convert the List[(Char, Int)] to required string representation.

Note: We are using front element of the list which is accessible in O(1) (constant time complexity) has buffer and increasing the count in case same element is found.

Scala REPL

scala> :paste
// Entering paste mode (ctrl-D to finish)

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

// Exiting paste mode, now interpreting.

encode: (word: String)String

scala> encode("AAABBCAADEEFF")
res0: String = 3A2B1C2A1D2E2F

Bit more concise with back ticks e instead of guard in pattern matching

   def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((`e`, lastCharCount) :: xs) => (e, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
Nagarjuna Pamu
  • 14,737
  • 3
  • 22
  • 40
1

Here's another more simplified approach based upon this answer:

class StringCompressinator {
  def compress(raw: String): String = {
    val split: Array[String] = raw.split("(?<=(.))(?!\\1)", 0) // creates array of the repeated chars as strings

    val converted = split.map(group => {
      val char = group.charAt(0) // take first char of group string
      s"${group.length}${char}"  // use the length as counter and prefix the return string "AAA" becomes "3A"
    })
    converted.mkString("") // converted is again array, join turn it into a string
  }
}

import org.scalatest.FunSuite

class StringCompressinatorTest extends FunSuite {

  test("testCompress") {
    val compress = (new StringCompressinator).compress(_)
    val input = "AAABBCAADEEFF"
    assert(compress(input) == "3A2B1C2A1D2E2F")
  }
}
k0pernikus
  • 60,309
  • 67
  • 216
  • 347
0

Similar idea with slight difference :

  • Case class for pattern matching the head so we don't need to use if and it also helps on printing end result by overriding toString
  • Using capital letter for variable name when pattern matching (either that or back ticks, I don't know which I like less :P)

       case class Count(c : Char, cnt : Int){
          override def toString = s"$cnt$c"
       }
    
       def compressor( counts : List[Count], C : Char ) = counts match {
            case Count(C, cnt) :: tail => Count(C, cnt + 1) :: tail
            case _ => Count(C, 1) :: counts
       }
    
       "AAABBCAADEEFF".foldLeft(List[Count]())(compressor).reverse.mkString
       //"3A2B1C2A1D2E2F"
    
wibisono
  • 61
  • 4