Remove Duplicate item from datatable that starts with alphabet

Question

I'm trying to remove duplicate data from datatable but not just keeping the first data entry and removed the second duplicate entry onward. I need to set a condition such that it will be able to removed the incorrect entry.

For example:

ID          Value
111          A
222          B
333          C
444          A

I want to remove 111 data and keep 444 because they have duplicate data A. The other solution I found will remove 444 instead. The closest thing I can find that relates to my question is this. Remove Duplicate item from list based on condition

The answer is using linq, which I'm not familiar with. I was thinking to use "StartsWith" to filter the correct data I want and I have no idea how to implement into it.

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g => g.Count() > 1 ? g.Where(x => x.Price != 500) : g); <-- I want to apply StartsWith here

Really appreciate if someone could help me with this.

Why do you want to remove entry with ID = 111 insetad of ID = 444?? — Michał Turczyn, Apr 25 '19 at 06:15
So what's the criteria to remove `111` and keep `444` ? What if there are such ID for `A` => `111`, `222`, `333`, `444`, `555` so which one will be remove? — er-sho, Apr 25 '19 at 06:16

Potato · Answer 1 · 2019-04-25T06:25:50.763

I think you need something like

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g =>
    {
       if (g.Count() > 1 && g.Key == "A") //g.Key.StartsWith("A")
         return g;
    });

This will return u an array where will be all "A" elements and then u could decide which u'd like to delete

To delete all duplicates and leave only the last inserted element:

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g =>
    {
       if (g.Count() > 1)
       {
          var mainElement = g.OrderByDescending(x => x.ID).First();
          return g.Where(x => x.ID != mainElement.ID).ToArray();
       }
    });

score 0 · Answer 2 · answered Apr 25 '19 at 07:23

You forgot to say why you want to keep item 444 and not item 111 instead of the other way around.

LINQ is developed to query data. LINQ will never change the original source sequence.

You can use LINQ to query the items that you want to remove, and then use a foreach to remove the items one by one.

To query the items with duplicates is easy. If you need this function more often, consider creating an extension function for this:

static IEnumerable<IGrouping<TSource, TKey>> GetDuplicates<TSource>(
   this IEnumerable<TSource> source,
   Func<TSource, TKey> propertySelector)
{
    // TODO: check source and propertySelector not null

    // make groups of source items that have the same value for property:
    return source.GroupBy(item => propertySelector(item))

        // keep only the groups that have more than one element
        // it would be a waste to Coun(), just stop after counting more than one
        .Where(group => group.Skip(1).Any());
}

This will give you groups of all source items that have duplicate values for the selected property.

In your case:

var itemsWithDuplicateValues = mySourceItems.GetDuplicates(item => item.Value);

This will give you all your source items that have duplicate values for item.Value, grouped by same item.Value

Now that you've got time to find out why you want to keep item with Id 444 and not 111, you can write a function that takes a group of duplicates and returns the elements that you want to remove.

static IEnumerable<TSource> SelectItemsIWantToRemove<TSource>(
   IEnumerable<TSource> source)
{
     // TODO: check source not null
     // select the items that you want to remove:
     foreach (var item in source)
     {
         if (I want to remove this item)
           yield return item;
     }
     // TODO: make sure there is always one item that you want to keep
     // or decide what to do if there isn't any item that you want to keep
}

Now that you've got a function that selects the items that you want to remove it is easy to create a LINQ that will select from your sequence of duplicates the item that you want to remove:

static IEnumerable<TSource> WhereIWantToRemove<TSource>(
   this IEnumerable<IGrouping<TSource>> duplicateGroups)
{
    foreach (var group in duplicateGroups)
    {
        foreach (var sourceItem in group.WhereIWantToRemove())
        {
            yield return sourceItem;
        }
    }
}

You could also use a SelectMany for this.

Now put everything together:

static IEnumerable<TSource> WhereIWantToRemove<TSource, TKey>(
   this IEnumerable<TSource> source,
   Func<TSource, TKey> propertySelector)
{
    return source.GetDuplicates(propertySelector)
        .WhereIWantToRemove();
}

Usage:

var itemsToRemove = mySourceItems.WhereIWantToRemove(item => item.Value);

You can see that I chose to create several fairly small and easy to understand extension functions. Of course you can put them all together in one big LINQ statement. However, I'm not sure if you can convince your project leader that this would make your code better readable, testable, maintainable and re-usable. So my advice would be to stick to the small extension functions.

score 0 · Answer 3 · answered Apr 25 '19 at 18:47

You can group the DataRows by Value and then select all the rows that don't match your conditions, and then delete all those rows:

var result = items.AsEnumerable()
                  .GroupBy(item => item.Field<string>("Value"))
                  .Where(g => g.Count() > 1)
                  .SelectMany(g => g.Where(x => !x.Field<string>("ID").StartsWith("4")));
foreach (var r in result) {
    r.Delete();
}

Remove Duplicate item from datatable that starts with alphabet

3 Answers3