Getting unique/distinct list of key value pairs in C#

Question

I have a list of key value pairs, where some of the values are duplicates. I would like to remove the key value pairs where the values are duplicates, leaving only one such pair. I have followed THIS post from SO, but I cannot seem to get it working properly. When I debug, I see the exact same list in the new list. The complete class code is shown below:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace MappingCodeImportHelper
{
    public class CodeMappingHelper
    {
    private List<KeyValuePair<int, string>> TargetJobCodeParallel { get; set; }
    private List<KeyValuePair<int, string>> SourceJobCodeParallel { get; set; }
    private List<KeyValuePair<int, string>> SourceJobCode_Distinct { get; set; }
    private StringBuilder TargetJobCodeOutputString { get; set; }
    private StringBuilder SourceJobCodeOutputString { get; set; }
    private string PathToFiles {get; set;}
    private string SourceFileName { get; set; }
    private string TargetFileName { get; set; }

    public CodeMappingHelper(string sourceJobCodeFileName, string targetJobCodeFileName)
    {
        this.SourceFileName = "\\" + sourceJobCodeFileName;
        this.TargetFileName = "\\" + targetJobCodeFileName;
        this.TargetJobCodeParallel = new List<KeyValuePair<int, string>>();
        this.SourceJobCodeParallel = new List<KeyValuePair<int, string>>();
        this.SourceJobCode_Distinct = new List<KeyValuePair<int, string>>();
    }


    internal void ImportCodesFromFile()
    {
        GetFilePaths();
        ReadInCodesFromFile();
    }

    private void ReadInCodesFromFile()
    {
        var digits = new[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };          
        using (StreamReader Reader = new StreamReader(PathToFiles + TargetFileName))
        {

            int counter = 0;
            string curLine = "";
            while( (curLine = Reader.ReadLine()) != null) 
            {
                if (curLine.IndexOf('-') == -1)
                    TargetJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.TrimEnd(digits)));
                else
                    TargetJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.Substring(0, curLine.IndexOf('-') + 1).TrimEnd(digits)));

                 ++counter;
            }

        }

        using (StreamReader Reader = new StreamReader(PathToFiles + TargetFileName))
        {
            int counter = 0;
            string curLine = "";
            while ((curLine = Reader.ReadLine()) != null)
            {
                if (curLine.IndexOf('-') == -1)
                    SourceJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.TrimEnd(digits)));
                else
                    SourceJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.Substring(0, curLine.LastIndexOf('-') + 1).TrimEnd(digits)));

                ++counter;
            }
        }
    }

    private void GetFilePaths()
    {
         PathToFiles = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
    }

    internal void MakeDistinctMaster()
    {
        //SourceJobCode_Distinct.AddRange(SourceJobCodeParallel.Where(keyPair => !SourceJobCode_Distinct.Contains(keyPair)));
        SourceJobCode_Distinct = SourceJobCodeParallel.Distinct().ToList();
    }
  }
}

In the program.cs file, add the below, changing the source and target file names to whatever you wish.

CodeMappingHelper mappingHelper = new CodeMappingHelper("JobCodeSourceDB.txt", "JobCodeTargetDB.txt");

mappingHelper.ImportCodesFromFile();
mappingHelper.MakeDistinctMaster();

ALSO, the files MUST be in the bin/debug folder, as I am using:

PathToFiles = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location

In the source file, add (sample data):

CoordHAF01
NurseRXN01
PresCEO01
ResidentialCnsl01
SenSecretary01
VPClinServ01
SeniorCaseMgr01
CoordIntakeClin01
ResidentialCnsl23
ResidentialCnsl24

The Target DB info is irrelevant at the point. When the data is read in, and after the MakeDistinctMaster() function has finished running, I would like the SourceJobCode_Distinct list to hold the values below, essentially getting rid of the second and third ResidentialCnsl:

CoordHAF
NurseRXN
PresCEO
ResidentialCnsl
SenSecretary
VPClinServ
SeniorCaseMgr
CoordIntakeClin

As a side note, the comment out line with the .AddRange function is producing the same results. Is there an obvious reason I would not get a distinct list in the MakeDistinctMaster() function?

If you want a more full understanding of my situation--in case is am going about it in a terrible way and you have a better solution--a client is moving from one DB system to another. I was given an excel list of job codes with one column from a source DB and a second column of the target job codes that will be in the target DB. The col on the left maps to the value in the col on the right, row for row.

However, for whatever reason, when the clients made the list in Excel they put in every single employee and their source/target job code, with a "01", or "02" tacked on to the end of the source col, but not the target col. For instance, if there were 5 people in the "manager" job, the source column would have "Manager01", "Manager02", "Manager03", etc... But the target col would just show "Mngr", "Mngr", "Mngr", etc... 5 times.

So that left me having to truncate the numbers and get rid of the duplicate values in the source and target DB columns. When I attempted to do the distinct/unique part in Excel, Excel messed up the order, essentially destroying the mappings. This made me turn to a console app.

I decided to put both columns into respective key values pair lists so I could do the unique operation on only one list, and then just look at the leftover keys from that list (the keys being ints from 0 to n) and apply it to the second list and spit two files with the final lists that actually map.

Is there a better/faster/more logical way to do this?

When you say you want to remove the duplicates - do you mean you want to remove *every* key/value pair where that value appears more than once? Or leave *one* such pair? If so, which? It would be really helpful if you could provide a [mcve] so we can see what you're expecting - as well as what you've tried so far. (Apparently you've tried something, but we don't know exactly what, or what went wrong.) — Jon Skeet, Feb 22 '16 at 07:03
Ah, @JonSkeet, I want to leave one such pair. The reason I am doing key/value pairs is so that it does not matter which pair to leave. I am new to this, so give me a bit to read about how to do the example. — Jeff.Clark, Feb 22 '16 at 07:07
But it *would* matter which you keep, if they have the same value but different keys. This is why a complete example would be so useful. (Just hard-code the sample data - we don't need any file reading.) — Jon Skeet, Feb 22 '16 at 07:12
Ok @JonSkeet, should be a better question now. Let me know if there is anything I could to do make it even more complete/helpful. — Jeff.Clark, Feb 22 '16 at 07:36
Also @JonSkeet, regarding your comment about it mattering which key/value pair I kept: In my last paragraph I mention that by operating on only one list, when I then write out the distinct target and source lists to files, I will only need to look at the keys from the list I operated on, and then write out the corresponding keys from the target list. I am quite aware that I could be thinking incorrectly, as I do it often. Is this incorrect? Is there a better way? — Jeff.Clark, Feb 22 '16 at 07:44
I'm afraid I'm still having trouble trying to get from your long description to anything concrete. As I've said numerous times now, this would all be *much* clearer with a [mcve]. — Jon Skeet, Feb 22 '16 at 08:10

Getting unique/distinct list of key value pairs in C#

0 Answers0