0

I want an efficient way of grouping strings whilst keeping duplicates and order. Something like this

1100110002200   -> 101020

I tried this previously

_case.GroupBy(c => c).Select(g => g.Key)

but I got 102

But this gives me what I want, I just want to optimize it, so I wouldn't have to scour the entire list each time

static List<char> group(string _case)
{
    var groups = new List<char>();
    for (int i = 0; i < _case.Length; i++)
    {
        if (groups.LastOrDefault() != _case[i])
            groups.Add(_case[i]);
    }
    return groups;
}
staa99
  • 191
  • 1
  • 2
  • 13
  • 1
    This seems like a better suit for [Code Review](https://codereview.stackexchange.com) since you have working code that you want to improve, but be sure to check [what's on topic there](https://codereview.stackexchange.com/help/on-topic) – Camilo Terevinto Mar 14 '18 at 01:01
  • 3
    What does "scour the entire list each time" mean? (Note: [for a List, `LastOrDefault` is an O(1) operation](https://stackoverflow.com/a/2799543/2864740).) How does it differ from "scouring" with GroupBy/Select/LINQ? – user2864740 Mar 14 '18 at 01:02
  • @CamiloTerevinto Thanks, I will check there right away – staa99 Mar 14 '18 at 01:05
  • @user2864740 I meant to request for a more efficient option, scouring the list with groupby is something I expect to be done in a more optimized manner. My expectations may be unattainable in some cases though – staa99 Mar 14 '18 at 01:07
  • I think your approach is pretty good TBH. It's about as non-scoury as you're likely to get (a single pass with indexed O(1) lookups). I wouldn't expect anything in LINQ-to-Objects to be this efficient. – spender Mar 14 '18 at 01:12
  • OK, thank you all very much. At least I can sleep without worry for now. – staa99 Mar 14 '18 at 01:15
  • I would use : _case.GroupBy(c => c).ToList(); – jdweng Mar 14 '18 at 01:16
  • 2
    @jdweng That doesn't do what he wants to though. – ProgrammingLlama Mar 14 '18 at 01:19
  • John, are you sure? – jdweng Mar 14 '18 at 01:24
  • 2
    @jdweng Yes, I am. He wants to take `1100110002200` and get `101020`. GroupBy will not do this as it will take all of the 1s and make a single group, another for the 0s, another for the 2s, etc. – ProgrammingLlama Mar 14 '18 at 01:26
  • Not sure what you are asking... Code shown in the post is O(string_length), there is nothing you can do to make it algorithmically faster. You can experiment with counting length first or caching last character... – Alexei Levenkov Mar 14 '18 at 01:34
  • Are you sure it is not a key and a value? – jdweng Mar 14 '18 at 08:53
  • Thank you all, I will try out Lejsek's answer. – staa99 Mar 15 '18 at 00:40

2 Answers2

4

You could create a method that loops each character and checks the previous character for equality. If they aren't the same, append/yield return the character. This is pretty easy to do with Linq.

public static string Simplify(string str)
{
    return string.Concat(str.Where((c, i) => i == 0 || c != str[i - 1]));
}

Usage:

string simplified = Simplify("1100110002200");
// 101020

In my testing, my method and yours are roughly equal in speed, mine being insignificantly slower after 10 million executions (4260ms vs 4241ms).

However, my method returns the result as a string whereas yours doesn't. If you need to convert your result back to a string (which is likely) then my method is indeed much faster/more efficient (4260ms vs 6569ms).

4

While I like the elegant solution of rshepp, it turns out that the very basic code can run even 5 times faster than that.

public static string Simplify2(string str)
{
    if (string.IsNullOrEmpty(str)) { return str; }

    StringBuilder sb = new StringBuilder();
    char last = str[0];
    sb.Append(last);

    foreach (char c in str)
    {
        if (last != c)
        {
            sb.Append(c);
            last = c;
        }
    }

    return sb.ToString();
}
Antonín Lejsek
  • 6,003
  • 2
  • 16
  • 18