1

In my asp.net c# application, I have following list of occurrences of item combinations. I want to list the most frequently occurrence combinations.

  1. Item1
  2. Item1, Item2
  3. Item3
  4. Item1, Item3, Item2
  5. Item3, Item1
  6. Item2, Item1

According to the above example, I should get below output.

most frequently occurrence of the combinations are;

  1. Item1 & Item2 - No of occurrences are 3 (#2, #4 & #6)
  2. Item1 & Item3 - No of occurrences are 2 (#4 & #5)

My structure is as below.

public class MyList
{
    public List<MyItem> MyItems { get; set; }
}

public class MyItem
{
    public string ItemName { get; set; }
}
  • 3
    Maybe you can be more specific about the type of your "items". It would also help to see the code. – Tim Schmelter Oct 30 '18 at 16:16
  • I'm fairly certain I'd end up using a `GroupBy` or something here, but impossible to say whether that's the appropriate approach without us seeing the structure of the classes you mention. –  Oct 30 '18 at 16:21
  • Item1 has the most number of occurrences... – sean Oct 30 '18 at 16:21
  • are you deaing with `List>`? – Ashkan Mobayen Khiabani Oct 30 '18 at 16:21
  • @sean: he searches the most frequent **combinations** not single items. A combination contains at least two so the first and third are not relevant. – Tim Schmelter Oct 30 '18 at 16:25
  • @TimSchmelter No, combinations does not imply more than 1. I commented because I think the question needs clarification. – sean Oct 30 '18 at 16:28
  • 1
    @sean: according to the expected result it does imply it. He doesn't want to know which item occurs most but which combinatios of two items occur most. Those two must be in the same sub-collection. – Tim Schmelter Oct 30 '18 at 16:29
  • Yes @TimSchmelter is right. I need combinations (more than one items), not single occurrences. –  Oct 30 '18 at 16:35
  • My question to the OP would be: how do you end up getting the items 1-6 listed in the post? (i.e., are they the results of each iteration of something or are they coming from one list?) – nocturns2 Oct 30 '18 at 16:35
  • Do you have access to an SSIS server? In case you have very large data, applying Market Basket analysis with SSIS package through c# may help you out. For more information on running SSIS package with C#, this article may help. https://learn.microsoft.com/en-us/sql/integration-services/ssis-quickstart-run-dotnet?view=sql-server-2017 – Hitesh Gaur Oct 30 '18 at 16:39
  • @nocturns2 I have updated my questions. The item list is like List< MyList >() –  Oct 30 '18 at 16:50
  • Occurrence is implied but not defined. Only double is not defined. -1 – paparazzo Oct 30 '18 at 18:55

4 Answers4

1

Out of the top of my head i would map all possible combinations using a hash where ab is the same as ba (or you could order your items alphabetically for example and then hash them) and then just count occurrences of the hashes...

Leonardo
  • 10,737
  • 10
  • 62
  • 155
0

You can create a weighted graph from your list with weight between two nodes representing frequency of occurrence. This StackExchange post has some information, as well as you can learn about adjacency matrix on this previous SO post here.

According to me, it would be wise to use HashSet<Tuple<Item1, Item2>> to represent a connection and have it's value stored in a dictionary.

For multiple items, the problem is similar to finding out which path was traversed most, in path traversal algorithm for graphs.

Though for very large set of data, I recommend using SSAS and SSIS services through SQL Statements and Analysis Queries dynamically with C# to create a market basket analysis, which should generate desired statistics for you.

Hitesh Gaur
  • 362
  • 1
  • 11
  • i don't think this is a very good approach because you can have "item1, 2, 3, 4, 5, 6" as a occurrence... – Leonardo Oct 30 '18 at 17:04
  • Thanks for the answer. The problem is there can be more than 2 items as the combination. See #4. –  Oct 30 '18 at 17:12
  • @Duminda: You can have tuple of Hashset of Hashset, use SetEquals Method to compare if combination exists or not.. and then increment the counter accordingly, for all combination of specific row, which should be a factorial of the count.. – Hitesh Gaur Oct 30 '18 at 18:37
0

Here is a quick and dirty way to do this to get you started. You should probably use hash tables for performance, but I think Dictionaries are easier to visualize.

Fiddle: https://dotnetfiddle.net/yofkLf

public static void Main()
{
    List<MyItem[]> MyItems = new List<MyItem[]>()
    {
        new MyItem[] { new MyItem("Item1") },
        new MyItem[] { new MyItem("Item1"), new MyItem("Item2") },
        new MyItem[] { new MyItem("Item3") },
        new MyItem[] { new MyItem("Item1"), new MyItem("Item3"), new MyItem("Item2") },
        new MyItem[] { new MyItem("Item3"), new MyItem("Item1") },
        new MyItem[] { new MyItem("Item2"), new MyItem("Item1") }
    };
    Dictionary<Tuple<string, string>, int> results = new Dictionary<Tuple<string, string>, int>();      
    foreach (MyItem[] arr in MyItems)
    {
        // Iterate through the items in the array. Then, iterate through the items after that item in the array to get all combinations.
        for (int i = 0; i < arr.Length; i++)
        {
            string s1 = arr[i].ItemName;
            for (int j = i + 1; j < arr.Length; j++)
            {
                string s2 = arr[j].ItemName;
                // Order the Tuple so that (Item1, Item2) is the same as (Item2, Item1).                    
                Tuple<string, string> t = new Tuple<string, string>(s1, s2);
                if (string.Compare(s1, s2) > 0)
                {
                    t = new Tuple<string, string>(s2, s1);  
                }
                if (results.ContainsKey(t))
                {
                    results[t]++;
                }
                else
                {
                    results[t] = 1;
                }
            }
        }
    }
    // And here are your results.
    // You can always use Linq to sort the dictionary by values.
    foreach (var v in results)
    {
        Console.WriteLine(v.Key.ToString() + " = " + v.Value.ToString());
        // Outputs:
        // (Item1, Item2) = 3
        // (Item1, Item3) = 2
        // (Item2, Item3) = 1
    }
}

...

public class MyItem
{
    public string ItemName { get; set; }
    public MyItem(string ItemName)
    {
        this.ItemName = ItemName;   
    }
}

Of course this would be different if you didn't have that string property in MyItems.

zambonee
  • 1,599
  • 11
  • 17
0

Here's a rough O(N^2) approach:

  • Iterate over the outer collection (the List<List<Item>>)
  • Come up with a way to define the current row, call it rowId
  • Now iterate the known row ids (inner iteration).
  • Count when one of these is a complete subset of the other; either the current row is contained in a previous set, or the previous set is contained in the current row. (This is the solution you want.) This works be incrementing the count of the rows previously seen if they are a subset of the current row, or tracking the number of times the current row is a subset of the previously seen combinations and setting that at the end of each inner iteration.

Some assumptions:

  • You don't care about every possible combination of items, only combinations that have already been seen.
  • Items have a unique identifier

Like I said above, this is an O(N^2) approach, so performance may be a concern. There's also two checks for subset membership which may be a performance issue. I'm also just joining and splitting ids as strings, you can probably get a more optimal solution by setting up another dictionary that tracks ids. There's also some room for improvement with Dictionary.TryGetValue. Extracting the sets of items you want is left as an exercise for the reader, but should be a straightforward OrderBy(..).Where(...) operation. But this should get you started.

public class MyItem
{
    public string ItemName { get; set; }
}

class Program
{
    public static void GetComboCount()
    {
        var itemsCollection = new List<List<MyItem>>() {
            new List<MyItem>() { new MyItem() { ItemName = "Item1" } },
            new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item2" } },
            new List<MyItem>() { new MyItem() { ItemName = "Item3" } },
            new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item2" } },
            new List<MyItem>() { new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item1" } },
            new List<MyItem>() { new MyItem() { ItemName = "Item2" }, new MyItem() { ItemName = "Item1" } }
        };

        var comboCount = new Dictionary<string, int>();

        foreach (var row in itemsCollection)
        {
            var ids = row.Select(x => x.ItemName).OrderBy(x => x);
            var rowId = String.Join(",", ids);
            var rowIdCount = ids.Count();

            var seen = false;

            var comboCountList = comboCount.ToList();
            int currentRowCount = 1;

            foreach (var kvp in comboCountList)
            {
                var key = kvp.Key;
                if (key == rowId)
                {
                    seen = true;
                    currentRowCount++;
                    continue;
                }

                var keySplit = key.Split(',');
                var keyIdCount = keySplit.Length;

                if (ids.Where(x => keySplit.Contains(x)).Count() == keyIdCount)
                {
                    comboCount[kvp.Key] = kvp.Value + 1;
                }
                else if (keySplit.Where(x => ids.Contains(x)).Count() == rowIdCount)
                {
                    currentRowCount++;
                }
            }

            if (!seen)
            {
                comboCount.Add(rowId, currentRowCount);
            }
            else
            {
                comboCount[rowId] = currentRowCount;
            }
        }

        foreach (var kvp in comboCount)
        {
            Console.WriteLine(String.Format("{0}: {1}", kvp.Key, kvp.Value));
        }
    }

    static void Main(string[] args)
    {
        GetComboCount();
    }
}

console output:

Item1: 5
Item1,Item2: 3
Item3: 3
Item1,Item2,Item3: 1
Item1,Item3: 2
BurnsBA
  • 4,347
  • 27
  • 39