5

Write code for run -length encoding of a given string
Sample Input: aaaaaaaaaabcccccc
Output: a10bc6

My code:

static void Main(string[] args)
{
    string str = "aaaaaaaaaabcccccc";
    var qry = (from c in str
               group c by c into grp
               select new
               {
                   output = grp.Key.ToString() + grp.Count().ToString()
               });
    StringBuilder sb = new StringBuilder();
    foreach (var item in qry)
    {
        sb.Append(item.output);
    }
    Console.WriteLine(sb.ToString());
    Console.ReadLine();
}

However it returns:

a10b1c6

I want to remove the count for non-repeating char, here is "1" for letter 'b'.

Assume that it is a sorted string.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
Hui Zhao
  • 655
  • 1
  • 10
  • 20

6 Answers6

4

add a ternary expression:

output = grp.Key + (grp.Count() > 1 ? grp.Count().ToString() : "")
Jonesopolis
  • 25,034
  • 12
  • 68
  • 112
3

Although OP did mention as an afterthought that in his/her case his source string was sorted, in general, the input to Run Length encoding won't be sorted as will lose information and can't be decompressed. Here's a take on the more general case of unsorted:

  string str = "aaaaaaaabccccccaadddddaaa"; // a8bc6a2d5a3

  // Zip the string with itself, offset by 1 character. 
  // Duplicate the last char to make strings equal length
  var pairs = str
    .Zip((str + str.Last()).Skip(1),
         (prev, current) => new { prev, current });

  // Retain a horrid mutable sequence which tracks consecutive characters
  var sequence = 0;
  var grps = pairs.GroupBy(p => 
    new { Ch = p.prev, 
          Sequence = p.current == p.prev
          ? sequence 
          : sequence++});

  // Join this together, using the other solutions to drop the count from single chars
  var rle = String.Join("", 
    grps.Select(g => g.Count() > 1
        ? g.Key.Ch.ToString() + g.Count().ToString() 
        : g.Key.Ch.ToString()));
  Console.WriteLine(rle);

Edit
I guess the number comments indicate some violation of the POLA which require explanation:

  • The string is Zipped with itself offset by one (Skip), in order to detect the boundaries of consecutive characters
  • Since Zip stops on the shortest enumeration, the last character is repeated on the shortest string to handle the final character in the string.
  • Unlike the 'sorted' RLE input string in the other answers, the Grouping key is done by the combination of character and a 'are the characters adjacent?' sequencer.
  • This sequence is rather horridly incremented within a conditional in the projection lambda of the GroupBy
  • @Jonesy's / @Tim's conditional join is used within a String.Join to reassemble the final encoded string.
StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • The code works. If you could explain a little bit your algorithm, it would be great. Why did you create a new string which had same length and offset by 1? – Hui Zhao Dec 20 '14 at 00:26
  • You are right - there were many astonishing things in the code - I've updated with an explanation included. – StuartLC Dec 21 '14 at 05:09
1

You can use the conditional operator for the core issue. Another approach is to use a Lookup which is similar to a dictionary and String.Concat:

var charLook = input.ToLookup(c => c);
string result = string.Concat(charLook
    .Select(g => string.Format("{0}{1}", g.Key, g.Count()==1 ? "" : g.Count().ToString())));
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

Here's a simplified version:

public static void Main()
{
   string str = "aaaaaaaaaabcccccc";
    var qry = (from c in str
               group c by c into grp
               let c = grp.Count()
               select grp.Key.ToString() + (c > 1 ? c.ToString() : ""));

    Console.WriteLine(string.Join("",qry));
    Console.ReadLine();
}

You need to be careful with the bracket placement around the ternary expression, and then I used string.Join to avoid the mess with a for each loop and string builder.

Matt Burland
  • 44,552
  • 18
  • 99
  • 171
  • It doesn't fully work. Input string "uuggikejfhhhhtttttii hd" produces output "u2g2i3kejfh5t5 4d", the h in the end has gone missing... And if I add something in the beginning like "aaabbbdddduuggikejfhhhhtttttii hd" then the output is "a3b3d5u2g2i3kejfh5t5 4", now the d at the end has gone missing too? – CodeOrElse Feb 04 '17 at 14:01
  • 1
    @CodeOrElse OP's question explicitly makes reference to `Assume that it is a sorted string.` which allows for a more simpler "group and count" strategy. I've addressed the more general case of RLE in a separate answer. – StuartLC Mar 31 '17 at 18:36
0

Please check the code below, it might help:

StringBuilder sb = new StringBuilder();
string x = "aaaaaaaaaabcccccc";
char[] c = x.ToCharArray();
char[] t = c.Distinct().ToArray();
for (int i = 0; i < t.Length; i++)
{
   int count = 0;

   for (int j = 1; j < c.Length; j++)
   {  
       if (t[i] == c[j - 1])
       {
          count++;
       }
   }

   if (count > 1)
   {
       sb.Append(t[i] + count.ToString());
   }
   else
   {
       sb.Append(t[i]);
   }

}
Console.Write(sb);
Console.ReadKey();
Eray Balkanli
  • 7,752
  • 11
  • 48
  • 82
0

I just discovered a short RegEx solution. Posting here to share, along with a great RegEx cheat sheet.

Regex.Replace(input, @"(\D)\1+", c => c.Length.ToString() + c.Value[0]);
Steve Nyholm
  • 806
  • 9
  • 9