-2

I am working on compression algorithm for which I want to replace all consecutive number with its mathematical form which is not logical mathematically but my algorithm will know and convert it to original form.
Suppose I have string:

string input = "732183900000000000002389288888888888888";

Did you see it have 0000000000 and 8888888888888 are major consecutive duplicates.
And now I want to convert those to:

//convert 000000000 to 0*9. Means 9 times 0.
//convert 888888888 to 8*9. Means 8 times 0.
string output = "7321839" +
                "0*13"    +
                "23892"   +
                "8*14";
//or
string output = "7321839-0*13-23892-8*14";

Points to consider:
Any language that works on windows will be accepted. For me main thing is algorithm.
Please keep performance in mind as it would be used for big files.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • 3
    What have you tried? How has it failed to meet your requirements? Stack Overflow isn't a code writing service. Regarding your question, it seems like a straightforward linear scan through the input string maintaining a count of how many times in a row it's seen the current digit would be fine, but you'd still have decide how many times a digit has to repeat before you replace its occurrences with the shorthand in the output. – Nathan Pierson Dec 08 '20 at 05:20
  • Thanks. I tried regex & other SO answers but without a luck. I believe it can be archived using c# regex. – Alex Hales Dec 08 '20 at 06:15
  • google RLE (Run Length Encoding) – Spektre Dec 08 '20 at 06:48
  • @Spektre Wow. I was not aware of RLE but my algorithm do same thing. You can answer and I will tick. I found many things. https://stackoverflow.com/a/27573642/14784394 , https://gist.github.com/sujaykundu777/9ee3328bf15fcc2b41a3576cf08b996c – Alex Hales Dec 08 '20 at 07:12
  • With [C# regex](https://dotnetfiddle.net/Z7xXL7). – Jarod42 Dec 08 '20 at 09:20
  • @AlexHales that is too trivial to be an answer ... RLE is pretty common .. for example PCX – Spektre Dec 08 '20 at 09:49

2 Answers2

2

To be honest this is as simple as it gets:

  • Parse through the string one character at a time.
  • Check if the previous character is the same as the current one.
  • If it is same then increment a counter variable or else reset it to 0.
  • If the counter value is greater than one when we reset the counter to 0 then add * to the result.
  • 1
    You probably would set the threshold higher than 2 instances in a row. It wouldn't make a lot of sense to "compress" `55` to `-5*2-`, for instance. But the basic idea seems sound. – Nathan Pierson Dec 08 '20 at 05:26
  • @NathanPierson Right. short strings should not be compressed. But long should be. – Alex Hales Dec 08 '20 at 06:17
  • @saurav-chanda Thanks. code implementation would be appreciated. Specially regex. – Alex Hales Dec 08 '20 at 06:18
0

Regex might be a bit convoluted for this given the rules for dashes (although not impossible by any means),

Seemingly, you want the following

  1. Groups of the same number greater than the count of 1
  2. No prefix dash
  3. No suffix dash
  4. No double dashes (speculative)

Here is a fairly efficient C# O(n) implementation with StringBuilder, which inurn should allow you to work with exceedingly large strings with minimal allocations

Given

public static string Shorten(string value)
{
    var sb = new StringBuilder(value.Length);
    int i, last;
    var isLastGroup = false;

    void Write()
    {
        var isGroup = i - last > 1;
        var getDash = last == 0 || isLastGroup ? "" : "-";
        sb.Append(isGroup ? $"{getDash}{value[last]}*{i - last}{(i != value.Length ? "-" : "")}" : value[last].ToString());
        isLastGroup = isGroup;
        last = i;
    }

    for (i = 0, last = 0; i < value.Length; i++)
        if (value[last] != value[i])
            Write();

    Write();

    return sb.ToString();
}

Tests

Console.WriteLine(Shorten("1"));
Console.WriteLine(Shorten("111"));
Console.WriteLine(Shorten("1112"));
Console.WriteLine(Shorten("1222"));
Console.WriteLine(Shorten("12233344445555512345"));

Results

1
13
1
3-2
1-23
1-2
2-33-44-5*5-12345

Full Demo Here

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
TheGeneral
  • 79,002
  • 9
  • 103
  • 141