30

I'm looking for a .Net implementation of a multiset. Can anyone recommend a good one?

(A multiset, or bag, is a set that can have duplicate values, and on which you can do set operations: intersection, difference, etc. A shopping cart for instance could be thought of as a multiset because you can have multiple occurrences of the same product.)

dan-gph
  • 16,301
  • 12
  • 61
  • 79
  • Please see: [C# Set collection?](http://stackoverflow.com/questions/183685/c-set-collection) – Mitch Wheat Apr 08 '10 at 05:10
  • 1
    Thanks. A couple of posters mentioned Wintellect Power Collections, which has a Bag type. It looks pretty good. – dan-gph Apr 08 '10 at 05:20
  • There's also the C5 stuff, but I don't think it implements set operations. – Mitch Wheat Apr 08 '10 at 05:25
  • 1
    You might also want to have a look at http://www.koders.com/csharp/fid38F43B7D46B1F44EA81C7EADA9E4B91F58E75123.aspx?s=textbox Hope it helps. – Izmoto Jul 16 '10 at 06:13

5 Answers5

6

I do not know about one, however you could use a Dictionary for that, in which the value is the quantity of the item. And when the item is added for the second time, you vould increase the value for it in the dictionary.

An other possibility would be to simply use a List of items, in which you could put duplicates. This might be a better approach for a shopping cart.

treaschf
  • 5,788
  • 1
  • 25
  • 24
  • 1
    Nice idea. But I need to be able to find the difference between two sets efficiently. If I rolled my own, I'd have to put a lot of effort into making sure it had that property. So I'd rather not do that. – dan-gph Apr 08 '10 at 05:27
  • 1
    Your idea for dictionaries with counts has grown on me. I think it would work well if your items have a Count property (which they do happen to have in my case) rather than being discrete values. Set difference should be O(N). A multiset would be better if you have discrete values. – dan-gph Apr 12 '10 at 01:52
5

Anything calling itself a C# implementation of a multiset should not be based on a Dictionary internally. Dictionaries are hash tables, unordered collections. C++'s sets, multisets, maps, and multimaps are ordered. Internally each is represented as some flavor of a self-balancing binary search tree.

In C# we should then use a SortedDictionary as the basis of our implementation as according to Microsoft's own documentation a SortedDictionary "is a binary search tree with O(log n) retrieval". A basic multiset can be implemented as follows:

public class SortedMultiSet<T> : IEnumerable<T>
{
    private SortedDictionary<T, int> _dict; 

    public SortedMultiSet()
    {
        _dict = new SortedDictionary<T, int>();
    }

    public SortedMultiSet(IEnumerable<T> items) : this()
    {
        Add(items);
    }

    public bool Contains(T item)
    {
        return _dict.ContainsKey(item);
    }

    public void Add(T item)
    {
        if (_dict.ContainsKey(item))
            _dict[item]++;
        else
            _dict[item] = 1;
    }

    public void Add(IEnumerable<T> items)
    {
        foreach (var item in items)
            Add(item);
    }

    public void Remove(T item)
    {
        if (!_dict.ContainsKey(item))
            throw new ArgumentException();
        if (--_dict[item] == 0)
            _dict.Remove(item);
    }

    // Return the last value in the multiset
    public T Peek()
    {
        if (!_dict.Any())
            throw new NullReferenceException();
        return _dict.Last().Key;
    }

    // Return the last value in the multiset and remove it.
    public T Pop()
    {
        T item = Peek();
        Remove(item);
        return item;
    }

    public IEnumerator<T> GetEnumerator()
    {
        foreach(var kvp in _dict)
            for(int i = 0; i < kvp.Value; i++)
                yield return kvp.Key;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
}
jwezorek
  • 8,592
  • 1
  • 29
  • 46
  • 2
    Except you can't go to the next/previous item after finding one, and there is no [`equal_range`](http://en.cppreference.com/w/cpp/algorithm/equal_range). This is where C++ `(multi_)set` and `(multi_)map` shine, you can rapidly find the beginning and end of a range and process everything in the range. – doug65536 Oct 06 '16 at 15:22
  • 4
    What's the motivation for sorting a multiset? The mathematical concept isn't ordered. A `Dictionary` isn't sorted, but so what? – Kenny Evitt May 15 '17 at 17:11
  • the motivation for sorting a multiset is that the C++ standard library data structure std::multiset is ordered, so that when a lot of people hear that someone is looking for a .Net implementation of "multiset", that exact name, is sounds like they are asking for a .Net implementation of std::multiset which would need to be ordered. – jwezorek May 15 '17 at 19:28
  • 14
    Why would someone who is looking for a .NET implementation of "multiset" except it to be 100% compliant with the semantics of `std::multiset` in C++, instead of, say, the mathematical concept of "multiset" (which is unordered) or the concept of "multiset" in pretty much every other programming language on the planet (most which are unordered). – Jörg W Mittag Sep 04 '18 at 12:40
  • Old answer, I know, but you should read about [unordered_multiset](https://en.cppreference.com/w/cpp/container/unordered_multiset) – Not a real meerkat Feb 13 '20 at 16:35
2

Another option is to just wrap SortedSet, but instead of storing your type T in it, you store the value tuple (T value, int counter) where counter goes up by 1 with each new instance of value that is inserted. Essentially you're forcing the values to be distinct. You can efficiently use GetViewBetween() to find the largest value of counter for a particular value, then increment it to get the counter for a newly-added value. And unlike the count dictionary solution, you can use GetViewBetween() to replicate the functionality equal_range, lower_bound, and upper_bound gives in C++. Here is some code showing what I mean:

public class SortedMultiSet<T> : IEnumerable<T>
{
    public void Add(T value)
    {
        var view = set.GetViewBetween((value, 0), (value, int.MaxValue));
        int nextCounter = view.Count > 0 ? view.Max.counter + 1 : 0;
        set.Add((value, nextCounter));
    }

    public bool RemoveOne(T value)
    {
        var view = set.GetViewBetween((value, 0), (value, int.MaxValue));
        if (view.Count == 0) return false;
        set.Remove(view.Max);
        return true;
    }

    public bool RemoveAll(T value)
    {
        var view = set.GetViewBetween((value, 0), (value, int.MaxValue));
        bool result = view.Count > 0;
        view.Clear();
        return result;
    }

    public SortedMultiSet<T> GetViewBetween(T min, T max)
    {
        var result = new SortedMultiSet<T>();
        result.set = set.GetViewBetween((min, 0), (max, int.MaxValue));
        return result;
    }

    public IEnumerator<T> GetEnumerator() =>
        set.Select(x => x.value).GetEnumerator();

    IEnumerator IEnumerable.GetEnumerator() =>
        set.Select(x => x.value).GetEnumerator();

    private SortedSet<(T value, int counter)> set =
        new SortedSet<(T value, int counter)>();
}

Now you can write something like this:

var multiset = new SortedMultiSet<int>();
foreach (int i in new int[] { 1, 2, 2, 3, 4, 5, 5, 6, 7, 7, 8 })
{
    multiset.Add(i);
}
foreach (int i in multiset.GetViewBetween(2, 7))
{
    Console.Write(i + " "); // Output: 2 2 3 4 5 5 6 7 7
}

In the past, there were some issues where GetViewBetween() ran in time O(output size), rather than time O(log n), but I think those have been resolved. At the time it would count up nodes to cache the count, it now uses hierarchical counts to perform Count operations efficiently. See this StackOverflow post and this library code.

Chiara Coetzee
  • 4,201
  • 1
  • 24
  • 20
1
public class Multiset<T>: ICollection<T>
{
    private readonly Dictionary<T, int> data;

    public Multiset() 
    {
        data = new Dictionary<T, int>();
    }

    private Multiset(Dictionary<T, int> data)
    {
        this.data = data;
    }

    public void Add(T item)
    {
        int count = 0;
        data.TryGetValue(item, out count);
        count++;
        data[item] = count;
    }

    public void Clear()
    {
        data.Clear();
    }

    public Multiset<T> Except(Multiset<T> another)
    {
        Multiset<T> copy = new Multiset<T>(new Dictionary<T, int>(data));
        foreach (KeyValuePair<T, int> kvp in another.data)
        {
            int count;
            if (copy.data.TryGetValue(kvp.Key, out count))
            {
                if (count > kvp.Value)
                {
                    copy.data[kvp.Key] = count - kvp.Value;
                }
                else
                {
                    copy.data.Remove(kvp.Key);
                }
            }
        }
        return copy;
    }

    public Multiset<T> Intersection(Multiset<T> another)
    {
        Dictionary<T, int> newData = new Dictionary<T, int>();
        foreach (T t in data.Keys.Intersect(another.data.Keys))
        {
            newData[t] = Math.Min(data[t], another.data[t]);
        }
        return new Multiset<T>(newData);
    }

    public bool Contains(T item)
    {
        return data.ContainsKey(item);
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        foreach (KeyValuePair<T, int> kvp in data)
        {
            for (int i = 0; i < kvp.Value; i++)
            {
                array[arrayIndex] = kvp.Key;
                arrayIndex++;
            }
        }
    }

    public IEnumerable<T> Mode()
    {
        if (!data.Any())
        {
            return Enumerable.Empty<T>();
        }
        int modalFrequency = data.Values.Max();
        return data.Where(kvp => kvp.Value == modalFrequency).Select(kvp => kvp.Key);
    }

    public int Count
    {
        get 
        {
            return data.Values.Sum();
        }
    }

    public bool IsReadOnly
    {
        get 
        { 
            return false; 
        }
    }

    public bool Remove(T item)
    {
        int count;
        if (!data.TryGetValue(item, out count))
        {
            return false;
        }
        count--;
        if (count == 0)
        {
            data.Remove(item);
        }
        else
        {
            data[item] = count;
        }
        return true;
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new MultisetEnumerator<T>(this);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return new MultisetEnumerator<T>(this);
    }

    private class MultisetEnumerator<T> : IEnumerator<T>
    {
        public MultisetEnumerator(Multiset<T> multiset)
        {
            this.multiset = multiset;
            baseEnumerator = multiset.data.GetEnumerator();
            index = 0;
        }

        private readonly Multiset<T> multiset;
        private readonly IEnumerator<KeyValuePair<T, int>> baseEnumerator;
        private int index;

        public T Current
        {
            get 
            {
                return baseEnumerator.Current.Key;
            }
        }

        public void Dispose()
        {
            baseEnumerator.Dispose();
        }

        object System.Collections.IEnumerator.Current
        {
            get 
            {
                return baseEnumerator.Current.Key;
            }
        }

        public bool MoveNext()
        {
            KeyValuePair<T, int> kvp = baseEnumerator.Current;
            if (index < (kvp.Value - 1))
            {
                index++;
                return true;
            }
            else
            {
                bool result = baseEnumerator.MoveNext();
                index = 0;
                return result;
            }
        }

        public void Reset()
        {
            baseEnumerator.Reset();
        }
    }
}
Paul Richards
  • 1,181
  • 1
  • 10
  • 29
  • Old thread, old answer, yes, yes, I know. Anyway, You have a nested template argument in your enumerator class. You don't need that. You can just use `private class MultisetEnumerator : IEnumerator`, since T is already defined in the scope of the inner class. – Arshia001 Oct 25 '17 at 12:39
  • This could be made much more efficient by eliminating a lot of the duplicate lookups. – NetMage Jul 05 '18 at 19:42
1

You can use this implementation of a sorted multiset: SortedMultiSet.cs

user3324131
  • 923
  • 8
  • 10