1

I need to be able to return back only the records that have a unique AccessionNumber with it's corresponding LoginId. So that at the end, the data looks something like:

  • A1,L1
  • A2,L1
  • A3,L2

However, my issue is with this line of code because Distinct() returns a IEnumerable of string and not IEnumerable of string[]. Therefore, compiler complains about string not containing a definition for AccessionNumber and LoginId.

yield return new[] { record.AccessionNumber, record.LoginId };

This is the code that I am trying to execute:

    internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
    {
        IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
        data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
        var z = data.Select(x => x.AccessionNumber).Distinct();

        foreach (var record in z)
        {
            yield return new[] { record.AccessionNumber, record.LoginId };
        }
    }
Nikolay Advolodkin
  • 1,820
  • 2
  • 24
  • 28
  • What do you want to do with your `Select` statement where you transform from `StudentAssessmentTestData` to `string` (assuming `AccessionNumber` is of type string)? – Icepickle Jun 15 '17 at 21:30
  • Possible duplicate of https://stackoverflow.com/questions/489258/linqs-distinct-on-a-particular-property – NetMage Jun 16 '17 at 00:28

7 Answers7

0

That's cause you are selecting only that property AccessionNumber by saying the below

var z = data.Select(x => x.AccessionNumber).Distinct();

You probably want to select entire StudentAssessmentTestData record

data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString()).Distinct();

    foreach (var record in data)
    {
        yield return new[] { record.AccessionNumber, record.LoginId };
    }
Rahul
  • 76,197
  • 13
  • 71
  • 125
  • Still brings back all of the AccessionNumber values, even non-unique – Nikolay Advolodkin Jun 15 '17 at 21:25
  • @NikolayAdvolodkin, you most probably don't want `distinct` rather want `row_number()` function like stuff. See this post https://stackoverflow.com/questions/9980568/row-number-over-partition-by-xxx-in-linq – Rahul Jun 15 '17 at 21:28
0

Instead of using Distinct, use GroupBy. This:

var z = data.Select(x => x.AccessionNumber).Distinct();

foreach (var record in z)
{
    yield return new[] { record.AccessionNumber, record.LoginId };
}

should be something like this:

return data.GroupBy(x => x.AccessionNumber)
    .Select(r => new { AccessionNumber = r.Key, r.First().LoginId});

The GroupBy() call ensures only unique entries for AccessionNumber and the First() ensures that only the first one LoginId with that AccessionNumber is returned.

This assumes that your data is sorted in a way that if there are multiple logins with the same AccessionNumber, the first login is correct.

mrfelis
  • 736
  • 7
  • 18
0

If you want to choose distinct values based on a certain property you can do it in several ways.

If it is always the same property you wish to use for comparision, you can override Equals and GetHashCode methods in the StudentAssessmentTestData class, thus allowing the Distinct method to recognize how the classes differ from each other, an example can be found in this question

However, you can also implement a custom IEqualityComparer<T> for your implementation, for example the following version

// Custom comparer taking generic input parameter and a delegate function to do matching
public class CustomComparer<T> : IEqualityComparer<T> {
    private readonly Func<T, object> _match;

    public CustomComparer(Func<T, object> match) {
        _match = match;
    }

    // tries to match both argument its return values against eachother
    public bool Equals(T data1, T data2) {
        return object.Equals(_match(data1), _match(data2));
    }

    // overly simplistic implementation
    public int GetHashCode(T data) {
        var matchValue = _match(data);
        if (matchValue == null) {
            return 42.GetHashCode();
        }
        return matchValue.GetHashCode();
    }
}

This class can then be used as an argument for the Distinct function, for example in this way

// compare by access number
var accessComparer = new CustomComparer<StudentTestData>(d => d.AccessionNumber );
// compare by login id
var loginComparer = new CustomComparer<StudentTestData>(d => d.LoginId );

foreach (var d in data.Distinct( accessComparer )) {
    Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}

foreach (var d in data.Distinct( loginComparer )) {
    Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}

A full example you can find in this dotnetfiddle

Icepickle
  • 12,689
  • 3
  • 34
  • 48
0

Add a LinqExtension method DistinctBy as below.

public static class LinqExtensions
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        HashSet<TKey> seenKeys = new HashSet<TKey>();
        foreach (TSource element in source)
        {
            if (seenKeys.Add(keySelector(element)))
            {
                yield return element;
            }
        }
    }
}

Use it in your code like this:

var z = data.DistinctBy(x => x.AccessionNumber);

internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
    IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
    data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
    var z = data.DistinctBy(x => x.AccessionNumber);

    foreach (var record in z)
    {
        yield return new[] { record.AccessionNumber, record.LoginId };
    }
}
Ayaz
  • 2,111
  • 4
  • 13
  • 16
0

This is the code that finally worked:

internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
        {
            var data = DataGetter.GetTestData("MyTestData");
            data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
            var z = data.GroupBy(x => new{x.AccessionNumber})
                .Select(x => new StudentAssessmentTestData(){ AccessionNumber = x.Key.AccessionNumber, LoginId = x.FirstOrDefault().LoginId});

            foreach (var record in z)
            {
                yield return new[] { record.AccessionNumber, record.LoginId };
            }
        }

Returns a sequence that looks like similar to this:

  • Acc1, Login1
  • Acc2, Login1
  • Acc3, Login2
  • Acc4, Login1
  • Acc5, Login3
Nikolay Advolodkin
  • 1,820
  • 2
  • 24
  • 28
  • If I understood your question correctly this isn't what you asked for. I thought you were asking for the records returned where there was a unique `AccessionNumber` - in other words, if two or more records had the same `AccessionNumber` then don't return them. This answer is returning only the first records for each `AccessionNumber`. Which was it that you wanted? – Enigmativity Jun 19 '17 at 04:19
  • It seems to be the same to me. I wanted one of every single accession number with the corresponding Login without having duplicate values of Accession number. This is exactly what I get with this answer. – Nikolay Advolodkin Jun 20 '17 at 11:05
  • OK, in your question you say "only the records that have a unique `AccessionNumber`" which means if two records have the same `AccessionNumber` then both those records do not have unique values so they shouldn't be returned. That was where my confusion was. – Enigmativity Jun 20 '17 at 11:32
0

You can try this. It works for me.

IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.GroupBy(x => x.AccessionNumber).SelectMany(y => y.Take(1));

foreach (var record in z)
{
    yield return new[] { record.AccessionNumber, record.LoginId };
}
Ayaz
  • 2,111
  • 4
  • 13
  • 16
0

I'm not 100% sure what you're asking. You either want (1) only records with a unique AccessionNumber , if two or more records had the same AccessionNumber then don't return them, or (2) only the first record for each AccessionNumber.

Here's both options:

(1)

internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
    return
        DataGetter
            .GetTestData("MyTestData");
            .Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
            .GroupBy(x => x.AccessionNumber)
            .Where(x => !x.Skip(1).Any())
            .SelectMany(x => x)
            .Select(x => new [] { x.AccessionNumber, x.LoginId });
}

(2)

internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
    return
        DataGetter
            .GetTestData("MyTestData");
            .Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
            .GroupBy(x => x.AccessionNumber)
            .SelectMany(x => x.Take(1))
            .Select(x => new [] { x.AccessionNumber, x.LoginId });
}
Enigmativity
  • 113,464
  • 11
  • 89
  • 172