2

There is a similar question posted, but I do not have the rep to ask a follow-up question in that thread. :(

If I have a List<T> that contains items that appear more than once, List.Distinct() will remove duplicates, but the original will still remain in place. If I want to remove items that occur more than once, including the original, what would be the most efficient way to do this to the original list?

Given a List<int> called oneTime:

{ 4, 5, 7, 3, 5, 4, 2, 4 }

The desired output would be in oneTime:

{ 7, 3, 2 }

Follow up question for @Enigmativity:

Here is a pseudo version of what my script is doing. It is done in NinjaTrader which runs on .NET3.5.

I will attach a general idea of what the code is supposed to be doing, I'd attach the actual script but unless using NinjaTrader, it might not be of use.

But essentially, there is a large z loop. Each time through, a series of numbers is added to 'LiTics.' Which I do not want to disturb. I then pass that list to the function, and return a list of values that only occur once. Then I'd like to see those numbers each time through the loop.

It works initially, but running this on various sets of data, after a few passes through the loop, it start reporting values that occur more than once. I'm not sure why exactly?

for(int z=1; z<=10000; z +=1)//Runs many times 
{ 
    if (BarsInProgress ==0 &&CurrentBar-oBarTF1>0 &&startScript )   //Some Condition
    {
        for(double k=Low[0]; k<=High[0]; k +=TickSize)  
        {   
            LiTics.Add(k);  
            //Adds a series of numbers to this list each time through z loop
            //This is original that I do not want to disturb
        }

        LiTZ.Clear();  //Display list to show me results Clear before populating
        LiTZ=GetTZone(LiTics); //function created in thread(below)
                               //Passing the undisturbed list that is modified on every loop
        foreach (double prime in LiTZ) { Print(Times[0] +",  " +prime);  }
        //Printing to see results   
    }

}//End of bigger 'z' loop

//Function created to get values that appear ONLY once
public List<double> GetTZone(List<double> sequence) 
{  
    var result =
        sequence
            .GroupBy(x => x)
            .Where(x => !x.Skip(1).Any())
            .Select(x => x.Key)
            .ToList();
    return result;
}

A picture of the print out and what is going wrong: Screenshot.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
pelt
  • 103
  • 1
  • 7

3 Answers3

7

So, if you can have a new list, then this is the easiest way to do it:

var source = new List<int>() { 4, 5, 7, 3, 5, 4, 2, 4 };

var result =
    source
        .GroupBy(x => x)
        .Where(x => !x.Skip(1).Any())
        .Select(x => x.Key)
        .ToList();

This gives:

{ 7, 3, 2 }

If you want to remove the values from the original source, then do this:

var duplicates =
    new HashSet<int>(
        source
            .GroupBy(x => x)
            .Where(x => x.Skip(1).Any())
            .Select(x => x.Key));

source.RemoveAll(n => duplicates.Contains(n));
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • Are there any known issues to how many times this can be called? I have a function that executes the code above by creating a new list, from the first example. It works as expected initially, but after it is called repeatedly, it starts returning lists that numbers that occur more than once. – pelt Apr 08 '16 at 06:15
  • @pelt - This should work each and every time. Can you show some example code of it failing? – Enigmativity Apr 08 '16 at 07:04
  • attached above example of where I seem to be having a problem labeled 'edit.' The script to show values in the list that occur only once occur inside a loop which will have values added to it each increment. Each time I'd like to display the list of values that occur once, later I will do something with these values. – pelt Apr 08 '16 at 14:22
  • @pelt - I can see a few weird things in your code already, but I can't run it to really test it. You really need to provide enough code for me to copy and paste into a dev environment and run it. I can tell you, however, that my code isn't causing you the issue. Can you please post a [mcve]? – Enigmativity Apr 08 '16 at 14:35
  • @pelt - Here are some starting issues. (1) When you call `LiTZ=GetTZone(LiTics);` you are assigning a brand-new list to `LiTZ` so there is no need at all to call `LiTZ.Clear()` first. You have two separate uses of the variable `k`. You are looping with a `double` as the loop variable which could mean you have rounding errors. – Enigmativity Apr 08 '16 at 14:38
  • Thanks so much! I don't think I would have figured that out, as it was definitely a rounding error. The multiple 'k' loops was me mis pasting in the example loop above, as my actual script has a different loop and I forgot to change those 'k' values to 'z.' – pelt Apr 08 '16 at 18:21
0

Here is an extension method for the List<T> class, that removes from the list all the items that appear more than once:

/// <summary>
/// Removes all the elements that have a key that appears more than once,
/// according to a specified key selector function.
/// </summary>
public static int RemoveDuplicatesByKey<TSource, TKey>(this List<TSource> list,
    Func<TSource, TKey> keySelector,
    IEqualityComparer<TKey> comparer = default)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(keySelector);
    Dictionary<TKey, int> occurences = new(list.Count, comparer);
    foreach (TSource item in list)
        CollectionsMarshal.GetValueRefOrAddDefault(
            occurences, keySelector(item), out _)++;
    return list.RemoveAll(item => occurences.TryGetValue(
        keySelector(item), out int value) && value > 1);
}

The occurrences of each element are counted with a Dictionary<TKey, int>, using the CollectionsMarshal.GetValueRefOrAddDefault method (.NET 6) for efficiency.

Usage example:

List<int> list = new() {4, 5, 7, 3, 5, 4, 2, 4};
Console.WriteLine($"Before: [{String.Join(", ", list)}]");
int removedCount = list.RemoveDuplicatesByKey(x => x);
Console.WriteLine($"After: [{String.Join(", ", list)}], Removed: {removedCount}");

Output:

Before: [4, 5, 7, 3, 5, 4, 2, 4]
After: [7, 3, 2], Removed: 5

Online Demo.

Note: The keySelector delegate should not throw exceptions, otherwise the RemoveDuplicatesByKey might introduce new duplicates instead of removing the existing ones.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
-3

I have two options for you, one that uses HashSet and other Linq .

Option 1:

Using HashSet, loop through collection and insert if it not exist and remove if it exists.

HashSet<int> hash = new HashSet<int>();

foreach(var number in list)
{
    if(!hash.Contains(number)) hash.Add(number);
    else hash.Remove(number);               
}
list = hash.ToList();

Option 2:

Simple Linq, group the elements and filter whose count >1.

var list= list.GroupBy(g=>g)
    .Where(e=>e.Count()==1)
    .Select(g=>g.Key)
    .ToList();

There is big performance gain using HashSet over Linq, it is obvious, Linq(in this case) require multiple iterations, where as HashSet uses single iteration and provides LookUp (for adding/removing) with O(1) access.

Elapsed Time (Using Linq): 8808 Ticks
Elapsed Time (Using HashSet): 51 Ticks

Working Demo

Hari Prasad
  • 16,716
  • 4
  • 21
  • 35
  • As the OP I did not down vote. I just returned to see this.... I don't know who did the downvote? – pelt Apr 05 '16 at 03:19
  • @pelt That's sad, people down voting without explaining the reason. – Hari Prasad Apr 05 '16 at 03:21
  • I'm seriously trying to understand how other answer is different from mine? that answer is up voted. – Hari Prasad Apr 05 '16 at 03:22
  • i dont see that the demo is anything related to this problem. perhaps thats what the downvote was from? otherwise, while i havnt tested your solution, the information about efficiency is very well received. – flux9998 Apr 25 '16 at 20:35
  • 1
    This was likely downvoted because option 1 simply doesn't work. Using the input values in the question this produces `{ 2, 4, 7, 3 }` even though there are three `4` values. Remove a `4` and it correctly produces `{ 2, 7, 3 }`. The problem is your `if`/`else`, which has the effect of treating a value as unique if it is repeated an _odd_ number of times. Also, the demo links to unrelated XML-parsing code. By the way, `Add()` returns a `bool` indicating if the value was added or already existed, so you don't ever need to do a conditional add with `if (!hash.Contains(number)) hash.Add(number);`. – Lance U. Matthews Feb 11 '20 at 19:27
  • The question asks about in-place removal. This answer shows how to create a new list based on an existing list, leaving the original list unchanged, so it doesn't answer the question. On top of that there is no guarantee that the items in the new list will be in the same order as the items in the original list. – Theodor Zoulias Jan 06 '23 at 04:12