There are two problems in your code:
You're converting the List
of DataX
objects to an "anonymous type object" (the new { x.user_id, x.date, x.application_ID }
). This object is not the same type as DataX
, and it can't be coerced back to a DataX
object automatically.
Trying to read between the lines a little, it looks like you want a distinct list of DataX
objects, where distinctness is determined by a subset of the properties of a DataX
object. So you have to answer the question, what will you do with duplicates (by this definition) that have different data in other properties? You have to discard some of them. Distinct()
is not the right tool for this, because it only applies to the entire object of the IEnumerable it is applied to.
It's almost like you need a DistinctBy
with one parameter giving the properties to calculate distinctness with, and a second parameter giving some logic for deciding which of the non-distinct "duplicates" to select. But this can be achieved with multiple IEnumerable methods: GroupBy
and a further expression to select an appropriate single itemd from each resulting group. Here's one possible solution:
FileName = Path.GetFileName(files[i]);
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.GroupBy(datax => new { datax.user_id, datax.date, datax.application_ID })
.Select(g => g.First()); // or another expression to choose one item per group
.ToList();
If, for example, there were a version
field and you wanted the most recent one for each "duplicate", you could:
.Select(g => g.OrderByDescending(datax => data.version).First())
Please note, however, that if you just want distinctness over all the properties of the object, and there is no need to select one particular value (in order to get its additional properties after throwing away some objects considered duplicates), then it may be as simple as this:
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.Distinct()
.ToList();
I would furthermore advise that you use IReadOnlyCollection
where possible (that's .ToList().AsReadOnly()
) and that, depending on your data, you may want to make the GetListFromFTP
function perform the de-duplication/distinctness instead.
To answer any concerns that GroupBy
isn't the right answer because it may not perform well enough, here is an alternate way to handle this (though I wholeheartedly disagree with you--until tests prove it's slow, it's a perfectly fine answer).
// in a static helper class of some kind
public static IEnumerable<T> DistinctBy<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
if (source == null) {
throw new ArgumentNullException("source", "Source enumerable cannot be null.");
}
if (keySelector == null) {
throw new ArgumentNullException("keySelector", "keySelector function cannot be null. To perform a generic distinct, use .Distinct().");
}
return DistinctByImpl(source, keySelector);
}
private static IEnumerable<T> DistinctByImpl<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
HashSet<TKey> keys = new HashSet<TKey>();
return source.Where(s => keys.Add(keySelector(s)));
}
It is used like this:
public class Animal {
public string Name { get; set; }
public string AnimalType { get; set; }
public decimal Weight { get; set; }
}
IEnumerable<Animal> animals = new List<Animal> {
new Animal { Name = "Fido", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Trixie", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Juliet", AnimalType = "Cat", Weight = 12.0M },
new Animal { Name = "Juliet", AnimalType = "Fish", Weight = 1.0M }
};
var filtered1 = animals.DistinctBy(a => new { a.AnimalType, a.Weight });
/* returns:
Name Type Weight
Fido Dog 15.0
Juliet Cat 12.0
Juliet Fish 1.0
*/
var filtered2 = animals.DistinctBy(a => a.Name); // or a simple property
/* returns:
Name Type Weight
Fido Dog 15.0
Trixie Dog 15.0
Juliet Cat 12.0
*/