I have a list of key value pairs, where some of the values are duplicates. I would like to remove the key value pairs where the values are duplicates, leaving only one such pair. I have followed THIS post from SO, but I cannot seem to get it working properly. When I debug, I see the exact same list in the new list. The complete class code is shown below:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace MappingCodeImportHelper
{
public class CodeMappingHelper
{
private List<KeyValuePair<int, string>> TargetJobCodeParallel { get; set; }
private List<KeyValuePair<int, string>> SourceJobCodeParallel { get; set; }
private List<KeyValuePair<int, string>> SourceJobCode_Distinct { get; set; }
private StringBuilder TargetJobCodeOutputString { get; set; }
private StringBuilder SourceJobCodeOutputString { get; set; }
private string PathToFiles {get; set;}
private string SourceFileName { get; set; }
private string TargetFileName { get; set; }
public CodeMappingHelper(string sourceJobCodeFileName, string targetJobCodeFileName)
{
this.SourceFileName = "\\" + sourceJobCodeFileName;
this.TargetFileName = "\\" + targetJobCodeFileName;
this.TargetJobCodeParallel = new List<KeyValuePair<int, string>>();
this.SourceJobCodeParallel = new List<KeyValuePair<int, string>>();
this.SourceJobCode_Distinct = new List<KeyValuePair<int, string>>();
}
internal void ImportCodesFromFile()
{
GetFilePaths();
ReadInCodesFromFile();
}
private void ReadInCodesFromFile()
{
var digits = new[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
using (StreamReader Reader = new StreamReader(PathToFiles + TargetFileName))
{
int counter = 0;
string curLine = "";
while( (curLine = Reader.ReadLine()) != null)
{
if (curLine.IndexOf('-') == -1)
TargetJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.TrimEnd(digits)));
else
TargetJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.Substring(0, curLine.IndexOf('-') + 1).TrimEnd(digits)));
++counter;
}
}
using (StreamReader Reader = new StreamReader(PathToFiles + TargetFileName))
{
int counter = 0;
string curLine = "";
while ((curLine = Reader.ReadLine()) != null)
{
if (curLine.IndexOf('-') == -1)
SourceJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.TrimEnd(digits)));
else
SourceJobCodeParallel.Add(new KeyValuePair<int, string>(counter, curLine.Substring(0, curLine.LastIndexOf('-') + 1).TrimEnd(digits)));
++counter;
}
}
}
private void GetFilePaths()
{
PathToFiles = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
}
internal void MakeDistinctMaster()
{
//SourceJobCode_Distinct.AddRange(SourceJobCodeParallel.Where(keyPair => !SourceJobCode_Distinct.Contains(keyPair)));
SourceJobCode_Distinct = SourceJobCodeParallel.Distinct().ToList();
}
}
}
In the program.cs file, add the below, changing the source and target file names to whatever you wish.
CodeMappingHelper mappingHelper = new CodeMappingHelper("JobCodeSourceDB.txt", "JobCodeTargetDB.txt");
mappingHelper.ImportCodesFromFile();
mappingHelper.MakeDistinctMaster();
ALSO, the files MUST be in the bin/debug folder, as I am using:
PathToFiles = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location
In the source file, add (sample data):
CoordHAF01
NurseRXN01
PresCEO01
ResidentialCnsl01
SenSecretary01
VPClinServ01
SeniorCaseMgr01
CoordIntakeClin01
ResidentialCnsl23
ResidentialCnsl24
The Target DB info is irrelevant at the point. When the data is read in, and after the MakeDistinctMaster() function has finished running, I would like the SourceJobCode_Distinct list to hold the values below, essentially getting rid of the second and third ResidentialCnsl:
CoordHAF
NurseRXN
PresCEO
ResidentialCnsl
SenSecretary
VPClinServ
SeniorCaseMgr
CoordIntakeClin
As a side note, the comment out line with the .AddRange function is producing the same results. Is there an obvious reason I would not get a distinct list in the MakeDistinctMaster() function?
If you want a more full understanding of my situation--in case is am going about it in a terrible way and you have a better solution--a client is moving from one DB system to another. I was given an excel list of job codes with one column from a source DB and a second column of the target job codes that will be in the target DB. The col on the left maps to the value in the col on the right, row for row.
However, for whatever reason, when the clients made the list in Excel they put in every single employee and their source/target job code, with a "01", or "02" tacked on to the end of the source col, but not the target col. For instance, if there were 5 people in the "manager" job, the source column would have "Manager01", "Manager02", "Manager03", etc... But the target col would just show "Mngr", "Mngr", "Mngr", etc... 5 times.
So that left me having to truncate the numbers and get rid of the duplicate values in the source and target DB columns. When I attempted to do the distinct/unique part in Excel, Excel messed up the order, essentially destroying the mappings. This made me turn to a console app.
I decided to put both columns into respective key values pair lists so I could do the unique operation on only one list, and then just look at the leftover keys from that list (the keys being ints from 0 to n) and apply it to the second list and spit two files with the final lists that actually map.
Is there a better/faster/more logical way to do this?