0

I have a sequence of characters (eg. "}çæø Ñ ") and I need to obtain a char,count pair, where char is the ASCII character code and count is the number of consecutive repetitions of that same character.

The above sequence would thus read:

<125,1>
<135,1>
<145,1>
<32,5>
<155,1>
<32,3>

Is there a quick way of doing that with a Dictionary?

I need to ONLY count adjacent characters (see character 32 in the above example). I understand that Dictionaries can't have key repetitions, so is there another quick way that doesn't involve string iteration? I might have very long strings to process and iteration takes way too long.

user2729463
  • 107
  • 1
  • 6
  • I think that this should be possible using Linq's `GroupBy` method – MindSwipe Dec 02 '19 at 07:24
  • Obligatory link: [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – John Wu Dec 02 '19 at 07:28
  • Do you really need ASCII? All those characters in your example (except the spaces and the curly brace) aren't ASCII – MindSwipe Dec 02 '19 at 07:29
  • 1
    `string source = ...; Dictionary result = source .GroupBy(c => c) .ToDictionary(group => group.Key, group => group.Count());` – Dmitry Bychenko Dec 02 '19 at 07:30
  • string input = "}çæø Ñ "; var results = input .ToCharArray() .Select(x => (int)x) .GroupBy(x => x) .Select(x => new { code = x.Key, count = x.Count() }) .ToList(); – jdweng Dec 02 '19 at 07:33
  • Thanks @DmitryBychenko but your method groups everything. I only need to group adjacent repetitions. In my example above you can see there is a cound of 5 for blankspace (32) and then there is another count of. – user2729463 Dec 02 '19 at 07:34
  • Does this answer your question? [Count the characters individually in a string using c#](https://stackoverflow.com/questions/10830228/count-the-characters-individually-in-a-string-using-c-sharp) – xdtTransform Dec 02 '19 at 07:34
  • Did you even try **anything**? We´re not here to do your work: thinking, trying out, thinking again. – MakePeaceGreatAgain Dec 02 '19 at 07:35
  • 1
    @user2729463: At least, you can't represent result as a *dictionary* since keys must be *unique* (key = 32 can't repeat) – Dmitry Bychenko Dec 02 '19 at 07:38
  • @HimBromBeere Of course I've tried. I'm not just looking to count all characters, but ONLY adjacent characters. I'm not familiar with dictionaries, but I've noticed they're very quick and since I might be running into very long strings of characters, I can't use iteration, since it takes way too long. – user2729463 Dec 02 '19 at 07:39
  • I don't see, how `"}çæø Ñ "` corresponds to the expected result. Could you give some more examples? What about `"aaa"` or `"aabbaabb"` or `"banana"`? – Corak Dec 02 '19 at 07:42
  • 1
    https://stackoverflow.com/questions/32432281/find-the-longest-repetition-character-in-string. What was the keyword you used when you search internet for an answer? – xdtTransform Dec 02 '19 at 07:42
  • @xdtTransform Hi, tried it already. For very long strings that solution gives me an Overflow exception. – user2729463 Dec 02 '19 at 07:46
  • @Corak I read arbitrary character strings, but the above example could be rewritten as "abc d " and I need the result to look like <97,1><98,1><99,1><32,5><100,1><32,3> – user2729463 Dec 02 '19 at 07:49
  • @user2729463 - yes, again, I don't see, why `"abc d "` would result in `<32,5>` and `<32,3>` are there space characters missing/removed? – Corak Dec 02 '19 at 07:51
  • @Corak The character sequence is 97 98 99 32 32 32 32 32 100 32 32 32. – user2729463 Dec 02 '19 at 07:53
  • @user2729463 - thanks, now it makes sense.The repeated spaces don't show up in the question. – Corak Dec 02 '19 at 07:54
  • Previous linked question is your answer. With the Extention from https://stackoverflow.com/questions/4681949/use-linq-to-group-a-sequence-of-numbers-with-no-gaps, Totaly do the trick and the modification are easy . `var result = text .GroupAdjacentBy((l, r) => l == r) .Select(x => new { letter= x.First(), count = x.Count()}) ;`. The "I try it give me XYZ", is hard to verify with the little information in your question.. – xdtTransform Dec 02 '19 at 07:56
  • @xdtTransform I'm trying it on long strings and i get an Out of Memory exception. :( – user2729463 Dec 02 '19 at 08:14
  • May you define long string ? – xdtTransform Dec 02 '19 at 08:15
  • Just over a 1MB. – user2729463 Dec 02 '19 at 08:19
  • Still no reproduction .. https://dotnetfiddle.net/pByzCM. for 65.48Mb string it took 0.21 sec. You must be doing something wrong somewhere in your code. It may have little to do with your question. – xdtTransform Dec 02 '19 at 08:30
  • If you are worry that the extention method is overflowing you you can modify it to return a structure with the element and the number of occurence. instead of a list. It's pretty strait forward. – xdtTransform Dec 02 '19 at 08:35

2 Answers2

1

The MoreLinq library has the method you need, see GroupAdjacent.

Usage:

string source = "}çæø     Ñ   ";
IEnumerable<(char c, int)> groups =
    source.GroupAdjacent(x => x, (c, lst) => (c, lst.Count()));

// Outputs ('}', 1) ('ç', 1) ('æ', 1) ('ø', 1) (' ', 5) ('Ñ', 1) (' ', 3)
Console.WriteLine(string.Join(" ", groups.Select((kv) => $"('{kv.Item1}', {kv.Item2})")));
Gebb
  • 6,371
  • 3
  • 44
  • 56
0

Standard Linq doesn't provide GroupByAdjacent or alike method(s), but we can implement it with a help of a simple foreach loop. Note, that we can't use Dictionary<char, int> since dictionary must have unique Keys (Key == ' ' can't reapeat):

  string source = "}çæø     Ñ   ";

  // We can't use Dictionary<char, int>
  // Let's put a list instead
  List<KeyValuePair<char, int>> result = new List<KeyValuePair<char, int>>();

  foreach (char c in source)
    if (result.Count <= 0 || result[result.Count - 1].Key != c)
      result.Add(new KeyValuePair<char, int>(c, 1));
    else
      result[result.Count - 1] = 
        new KeyValuePair<char, int>(c, result[result.Count - 1].Value + 1);

Let's have a look:

  string report = string.Join(Environment.NewLine, result
    .Select(pair => $"<'{pair.Key}' ({(int) pair.Key,3}) : {pair.Value}>"));

  Console.Write(report);

Outcome: (please, note, that char in not an Ascii character but Unicode)

<'}' (125) : 1>
<'ç' (231) : 1>
<'æ' (230) : 1>
<'ø' (248) : 1>
<' ' ( 32) : 5>
<'Ñ' (209) : 1>
<' ' ( 32) : 3>
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215