-1

Possible Duplicate:
Compare two Lists for differences

I have following set of arrays

string[] arr1 = { 1155717, 5184305, 2531291, 1676341, 1916805 ... } 
string[] arr2 = { 1155717, 1440230, 2531291, 8178626, 1916805 ... }
string[] arr3 = { 1155717, 5184305, 4025514, 1676341, ... }

Arrays count is in millions & can contains characters as well. I want to create a report like this in csv

diff.csv

arr1,arr2,arr3
1155717,1155717,1155717
5184305,--N/A--,5184305
--N/A--,1440230,--N/A--
--N/A--,--N/A--,4025514
1676341,--N/A--,1676341
--N/A--,8178626,--N/A--
1916805,1916805,--N/A--

I guess applying for loops in each & comparing wouldn't be that good approach.Any Ideas?
Few Things i missed :
1. Order doesnt matter.
2. Elements in a single list will be unique.
3. I plan to skip loops as far as possible & look for .NET 3.5 / 4.0 's new features in LINQ / Generics which I can apply here!

For those voting negatively or closing this question please explain so?

Community
  • 1
  • 1
Pratik
  • 11,534
  • 22
  • 69
  • 99
  • 2
    Sort them and loop through them. – nhahtdh Dec 19 '12 at 13:22
  • http://stackoverflow.com/questions/675699/compare-two-lists-for-differences – GeorgesD Dec 19 '12 at 13:22
  • Have you tried the so-called naive approach of solving this with a series of nested loops yet? If so, is performance unacceptable? – Yuck Dec 19 '12 at 13:24
  • @GeorgesD : Its only for 2 Lists at a time I'm targetiing 3 list. Also its on .NET 2.0 I'm looking for .NET 3.5 – Pratik Dec 19 '12 at 13:26
  • In what order should the elements be in the final output? Or doesmn't it matter? – Mark Byers Dec 19 '12 at 13:26
  • 3
    Sorry but what can be done for two can be done for 3 or more. – GeorgesD Dec 19 '12 at 13:27
  • @nhahtdh I plan to use .NET 3.5 LINQ or any new feature other than loops.. loops is a straight forward approach – Pratik Dec 19 '12 at 13:28
  • @Yuck As I said looping is straing forward approach which I'm aware looking for something different like any new features of .NET 3.5 /LINQ / Generics – Pratik Dec 19 '12 at 13:29
  • 2
    @Pratik My close vote (not downvote) is because you haven't tried anything at all. LINQ methods are **very well documented** as well as covered on Stack Overflow. Show what you've tried so far; don't just arrive here today demanding solutions. – Yuck Dec 19 '12 at 13:37
  • Maybe because it doesn't show any effort and/or isn't generally useful. Like GeorgesD points out, if you can compare 2, you can compare 3 or 4, or n lists. – weston Dec 19 '12 at 13:40

4 Answers4

2

I have done a small example with arrays of type int but this can be applied to strings

        int[] arr1 = { 1155717, 5184305, 2531291, 1676341, 1916805 } ;
        int[] arr2 = { 1155717, 1440230, 2531291, 8178626, 1916805 };
        int[] arr3 = { 1155717, 5184305, 4025514, 1676341 };

        foreach (int i in arr1)
        {
            Console.Write(i + "  ");
            foreach (int b in arr2)
            {
                if (i == b)
                    Console.Write(b + "  ");

            }
            foreach (int c in arr3)
            {
                if (i == c)
                    Console.Write(c + "  ");
            }
            Console.WriteLine();
        }
        Console.ReadLine();

Only problem is that you are using loops within loops, so if your arrays are large then you performance will be effected. This is just a simple idea to get you thinking.

SpaceApple
  • 1,309
  • 1
  • 24
  • 46
  • I would rather recommend using collections in this case and not normal arrays since collections have built in functions to help speed this type of process up. – SpaceApple Dec 19 '12 at 13:44
2

You could use this Linq query and string.Join:

string[][] all = new[] { arr1, arr2, arr3 };
int maxLength = all.Max(arr => arr.Length);
string separator = ",";
string defaultValue = "N/A";

var csvFields = all.Select(arr => Enumerable.Range(0, maxLength)
                   .Select(i => arr.Length <= i ? defaultValue : arr[i]));
string csv = string.Join(Environment.NewLine, 
                        csvFields.Select(f => string.Join(separator, f)));
File.WriteAllText(path, csv);  

Demo

I put all arrays in a jagged array. Then i use an int range as starting point(0-4 in your sample since the largest array has 5 elements). Then i take 5 elements from each array and a default-value of "N/A" if the array is smaller than that index.

The last stage is to use string.Join to link all parts of each array with your separator (",") and every line with Environment.NewLine.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

I'd do something like this maxing out at O(4N) or something but maybe someone knows a faster method.

private void PrintDiff()
{
        public Dictionary<string, Model> dictionary = new Dictionary<string, Model>();

        foreach (var entry in Array1)
        {
            dictionary.Add(entry, (new Model()).Add(entry, "Array1"));
        }
        foreach (var entry in Array2)
        {
            if (!dictionary.ContainsValue(entry))
                 dictionary.Add(entry, (new Model()).Add(entry, "Array2"));
        }
        foreach (var entry in Array3)
        {
            if (!dictionary.ContainsValue(entry))
                 dictionary.Add(entry, (new Model()).Add(entry, "Array3"));
        }


        //now print 
        foreach (var model in dictionary)
        {
            model.ToString();
        }
    }

public class Model
{

    public Model()
    {
        Dictionary = new Dictionary<string, string>();
    }

    private Dictionary<string, string> Dictionary
    {
        get;
        set;
    }

    public bool ContainsEntry(string entry)
    {
        return Dictionary.ContainsValue(entry);
    }

    public void Add(string entry, string arrayName)
    {
        Dictionary.Add(arrayName, entry);
    }

    public override string ToString()
    {
        return "FORMATED AS YOU WANT THEM";
    }
}
Moriya
  • 7,750
  • 3
  • 35
  • 53
1

You can use linq to GroupJoin:

string[] arr1 = { "1155717", "5184305", "2531291", "1676341", "1916805" };
string[] arr2 = { "1155717", "1440230", "2531291", "8178626", "1916805" };
string[] arr3 = { "1155717", "5184305", "4025514", "1676341" };

var allPossibleTerms = arr1.Union(arr2).Union(arr3);

allPossibleTerms
    .GroupJoin(arr1, all => all, a1 => a1, (all, a1) => new { Number = all, A1 = a1 })
    .SelectMany(joined => joined.A1.DefaultIfEmpty(), (collection, result) => new { collection.Number, A1 = result})
    .GroupJoin(arr2, joined => joined.Number, a2 => a2, (collection, a2) => new { Number = collection.Number, A1 = collection.A1, A2 = a2 })
    .SelectMany(joined => joined.A2.DefaultIfEmpty(), (collection, result) => new { collection.Number, A1 = collection.A1, A2 = result})
    .GroupJoin(arr3, joined => joined.Number, a3 => a3, (collection, a3) => new { Number = collection.Number, A1 = collection.A1, A2 = collection.A2, A3 = a3 })
    .SelectMany(joined => joined.A3.DefaultIfEmpty(), (collection, result) => new { collection.Number, A1 = collection.A1, A2 = collection.A2, A3 = result});;

Basically, this creates a master-list of all terms, and joins each array as it goes.

╔══════════════════════════════════════╗
║ Number   A1       A2       A3        ║
╠══════════════════════════════════════╣
║ 1155717  1155717  1155717  1155717   ║
║ 5184305  5184305  -------  5184305   ║
║ 2531291  2531291  2531291  -------   ║
║ 1676341  1676341  -------  1676341   ║
║ 1916805  1916805  1916805  -------   ║
║ 1440230  -------  1440230  -------   ║
║ 8178626  -------  8178626  -------   ║
║ 4025514  -------  -------  4025514   ║
╚══════════════════════════════════════╝
Dave Bish
  • 19,263
  • 7
  • 46
  • 63