Modify duplicate values with duplication index suffix (using Linq)

Question

I have a list:

List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };

I want the output to be list of:

"dog (1)", "cat", "dog (2)", "bird"

I've already looked through this question but it is only talking about count the duplicates, my output should be with its duplicate index. like duplicate (index)

I've tried this code:

var q = list.GroupBy(x => x)
            .Where(y => y.Count()>1)
            .Select(g => new {Value = g.Key + "(" + g.Index + ")"})

but it does not seem to work because:

Need to return all of my list back \ Or just modify the existing one. (and my answer returning only the duplicate ones)
For duplicate values need to add a prefix based on their "duplicate index".

How to do this in C#? Is there a way using Linq?

What have you tried? What specifically is not working? Do you want the order to be preserved? You need to include a [MCVE] with your question if you expect anyone to be able to help you — maccettura, Feb 15 '18 at 19:43
i tried group by and where (t=>t.Count > 1) and then Select with the new operator object but all i got is only `"dog (2)", "cat", "dog (2)", "bird"` — Shahar Shokrani, Feb 15 '18 at 19:46
Possible duplicate of [How to Count Duplicates in List with LINQ](https://stackoverflow.com/questions/454601/how-to-count-duplicates-in-list-with-linq) — Francesco B., Feb 15 '18 at 19:48
Do you want to modify the values in the list, or just the formatted output? — Rufus L, Feb 15 '18 at 19:55
@RufusL i prefer modify the values in the list but i can work with the formatted output also. — Shahar Shokrani, Feb 15 '18 at 20:02
@FrancescoB. regard the possible duplication see my updated question - it has nothing to do with counting the duplicate (only modifying them based on their duplicate index) — Shahar Shokrani, Feb 15 '18 at 20:11
I note that you say that you need to add an indexing prefix, and you actually add a suffix. — Eric Lippert, Feb 15 '18 at 20:29

Eric Lippert · Answer 1 · 2018-02-15T20:32:13.267

The accepted solution works but is extremely inefficient when the size of the list grows large.

What you want to do is first get the information you need in an efficient data structure. Can you implement a class:

sealed class Counter<T>
{
  public void Add(T item) { }
  public int Count(T item) { }
}

where Count returns the number of times (possibly zero) that Add has been called with that item. (Hint: you could use a Dictionary<T, int> to good effect.)

All right. Now that we have our useful helper we can:

var c1 = new Counter<string>();
foreach(string item in myList)
  c1.Add(item);

Great. Now we can construct our new list by making use of a second counter:

var result = new List<String>();
var c2 = new Counter<String>();
foreach(string item in myList)
{
  c2.Add(item);
  if (c1.Count(item) == 1))
    result.Add(item);
  else
    result.Add($"{item} ({c2.Count(item)})");
}

And we're done. Or, if you want to modify the list in place:

var c2 = new Counter<String>();
// It's a bad practice to mutate a list in a foreach, so
// we'll be sticklers and use a for.
for (int i = 0; i < myList.Count; i = i + 1)
{
  var item = myList[i];
  c2.Add(item);
  if (c1.Count(item) != 1))
    myList[i] = $"{item} ({c2.Count(item)})";
}

The lesson here is: create a useful helper class that solves one problem extremely well, and then use that helper class to make the solution to your actual problem more elegant. You need to count things to solve a problem? Make a thing-counter class.

Rufus L · Answer 2 · 2018-02-15T20:11:50.837

One way to do this is to simply create a new list that contains the additional text for each item that appears more than once. When we find these items, we can create our formatted string using a counter variable, and increment the counter if the list of formatted strings contains that counter already.

Note that this is NOT a good performing solution. It was just the first thing that came to my head. But it's a place to start...

private static void Main()
{
    var myList = new List<string> { "dog", "cat", "dog", "bird" };

    var formattedItems = new List<string>();

    foreach (var item in myList)
    {
        if (myList.Count(i => i == item) > 1)
        {
            int counter = 1;
            while (formattedItems.Contains($"{item} ({counter})")) counter++;
            formattedItems.Add($"{item} ({counter})");
        }
        else
        {
            formattedItems.Add(item);
        }
    }

    Console.WriteLine(string.Join(", ", formattedItems));

    Console.Write("\nDone!\nPress any key to exit...");
    Console.ReadKey();
}

Output

Your solution is quadratic in the size of the original list; can you devise a solution which is more efficient as the list grows longer? — Eric Lippert, Feb 15 '18 at 20:09
Probably but I unfortunately have to run. This was just the first thing that popped into my head. I was about to edit it to say that this is **not** a good performing solution. — Rufus L, Feb 15 '18 at 20:10
@RufusL thanks for the answer, I've undo the acceptation for now because of the performance issue — Shahar Shokrani, Feb 15 '18 at 20:14

Rotem · Accepted Answer · 2018-02-15T20:45:41.667

This solution is not quadratic with respect to size of list, and it modifies the list in place as preferred in OP.

Any efficient solution will involve a pre-pass in order to find and count the duplicates.

List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };

//map out a count of all the duplicate words in dictionary.
var counts = myList
    .GroupBy(s => s)
    .Where(p => p.Count() > 1)
    .ToDictionary(p => p.Key, p => p.Count());

//modify the list, going backwards so we can take advantage of our counts.
for (int i = myList.Count - 1; i >= 0; i--)
{
    string s = myList[i];
    if (counts.ContainsKey(s))
    {
        //add the suffix and decrement the number of duplicates left to tag.
        myList[i] += $" ({counts[s]--})";
    }
}

Rufus L · Answer 4 · 2018-02-15T20:50:42.453

1

Ok, @EricLippert challenged me and I couldn't let it go. Here's my second attempt, which I believe is much better performing and modifies the original list as requested. Basically we create a second list that contains all the duplicate entries in the first. Then we walk backwards through the first list, modifying any entries that have a counterpart in the duplicates list, and remove the item from the duplicates list each time we encounter one:

private static void Main()
{
    var myList = new List<string> {"dog", "cat", "dog", "bird"};
    var duplicates = myList.Where(item => myList.Count(i => i == item) > 1).ToList();

    for (var i = myList.Count - 1; i >= 0; i--)
    {
        var numDupes = duplicates.Count(item => item == myList[i]);
        if (numDupes <= 0) continue;
        duplicates.Remove(myList[i]);
        myList[i] += $" ({numDupes})";
    }

    Console.WriteLine(string.Join(", ", myList));

    Console.Write("\nDone!\nPress any key to exit...");
    Console.ReadKey();
}

Output

edited Feb 15 '18 at 20:50

answered Feb 15 '18 at 20:47

Rufus L

36,127
5
30
43

I believe this is still quadratic, `myList.Count()` is an `O(N)` operation, and you're calling it for each element of the list. – Rotem Feb 15 '18 at 20:50
Oh, rats. Thanks @Rotem – Rufus L Feb 15 '18 at 20:51
Yeah, I ran some perf tests with 20,000 duplicate entries using by mine and @EricLippert solutions, and his was 610 times faster. Talk about humbling! But a great learning experience :) – Rufus L Feb 15 '18 at 21:10
Although, to take a small amount of credit, my second solution is WAYYY faster than my first. – Rufus L Feb 15 '18 at 21:14
Im curious, did you test against mine? – Rotem Feb 15 '18 at 21:16
@Rotem Yes, yours is faster than Erics!! Well, he didn't show his implementation of the `Counter` class, so maybe there's something I missed there. I'll link to the code i used in a sec... – Rufus L Feb 15 '18 at 21:18
@RufusL: I would have just used a dictionary as the counter. It should have approximately the same perf characteristics as Rotem's. The point of my answer was that you can sometimes pay a small penalty of having to implement an abstraction but the payoff is that your code looks very elegant. Rotem's code is efficient, but it looks like the code is all about manipulating dictionaries; mine looks like it is about *counting things*. – Eric Lippert Feb 15 '18 at 21:30
Yeah, only one run actually was faster than yours, the others were very close, but @EricLippert was faster. Here's my [horse race](https://dotnetfiddle.net/MmqhYI). Ok, now I *really* have to go! Thanks for the fun afternoon exercise and learning – Rufus L Feb 15 '18 at 21:40
@Rufus You can optimize Eric's implementation further by using `TryGetValue` in the `Count` method instead of `if (ContainsKey)...`. – Rotem Feb 16 '18 at 05:54
@Rotem Interesting. How does that optimize, though? I think both [`ContainsKey`](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,22fd7cd7408aed6e,references) and [`TryGetValue`](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,2e5bc6d8c0f21e67,references) call the same private method to get the key. And `TryGetValue` returns the value if the index is >= 0 like I am. What am I missing? – Rufus L Feb 16 '18 at 06:29
@Rufus Correct but in your case you are getting the value twice. Once in `ContainsKey` and once in the index operator. `TryGetValue` combines these actions in the same operation. – Rotem Feb 16 '18 at 18:43
`ContainsKey` does not get the value, though - it returns `true` if `FindEntry` returns a valid index. If it returns true, then I get the value by index in my `if` block. `TryGetValue` also calls `FindEntry`, and if that returns true then `TryGetValue` returns the value by index in it's own `if` block. – Rufus L Feb 16 '18 at 20:22
@RufusL The expensive operation *is* `FindEntry`. In your case it is called twice, one for `ContainsKey` and once to get the value. The whole purpose of `TryGetValue` is to avoid executing `FindEntry` twice. – Rotem Feb 19 '18 at 07:56
@Rotem Oh, got it! I didn't realize it was being called when accessing the item by index. Thanks! – Rufus L Feb 19 '18 at 16:40

Modify duplicate values with duplication index suffix (using Linq)

4 Answers4