1

I'm trying to remove the duplicate combination from a csv file.

I tried using Distinct but it seems to stay the same.

string path;
string newcsvpath = @"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";

OpenFileDialog openfileDial = new OpenFileDialog();

if (openfileDial.ShowDialog() == DialogResult.OK)
{
    path = openfileDial.FileName;

    var lines = File.ReadLines(path);
    var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();

    var unique = grouped.Select(g => g.First());
    var buffer = new StringBuilder();

    foreach (var name in unique)
    {
        string value = name;
        buffer.AppendLine(value);
    }

    File.WriteAllText(newcsvpath ,buffer.ToString());
    label5.Text = "Complete";
}

For example, I have a combination of

{ 1,1,1,1,1,1,1,1 }      { 1,1,1,1,1,1,1,2 } 
{ 2,1,1,1,1,1,1,1 }      { 1,1,1,2,1,1,1,1 }

The output should be

{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 } 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

2 Answers2

2

From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.

So from reading your file, you have:

var lines = new[] 
{
    "1,1,1,1,1,1,1,1",
    "1,1,1,1,1,1,1,2",
    "2,1,1,1,1,1,1,1",
    "1,1,1,2,1,1,1,1"
};

Now let's convert it to an array of number sequences:

var linesAsNumberSequences = lines.Select(line => line.Split(',')
        .Select(int.Parse)
        .ToArray())
    .ToArray();

Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:

var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
        .Select(int.Parse)
        .OrderBy(number => number)
        .ToArray())
    .ToArray();

When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question

var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());
Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36
0

Try it

HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
    StringBuilder textEditor= new StringBuilder();
    foreach (string col in columns)
    {
        textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
    }
    if (!record.Add(textEditor.ToString())
    {
    }
}
SUNIL DHAPPADHULE
  • 2,755
  • 17
  • 32