3

I am trying to find a appropriate data structure for conditional probability table in C#, see for example - Russell and Norvig, Artificial Intelligence A Modern Approach. enter image description hereI need to define a structure where the combination of key-value pairs defines some number. For example:

B E P
true true 0.95
true false 0.94
false true 0.29
false false 0.001

or

Cold Flu Malaria P
false false false 0.0
false false true 0.9
false true false 0.8
false true true 0.98

The number of key-value pairs can be arbitrary, but the numeric value that this combination defines (P) is always one. What structure in C# can be used fir this?

I used a dictionary that get another dictionary as a key. Something like:

Dictionary<string, bool> row1 = new Dictionary<string, bool>() { { "B", true }, { "E", true } };
Dictionary<string, bool> row2 = new Dictionary<string, bool>() { { "B", true }, { "E", false } };
Dictionary<string, bool> row3 = new Dictionary<string, bool>() { { "B", false }, { "E", true } };
Dictionary<string, bool> row4 = new Dictionary<string, bool>() { { "B", false }, { "E", false } };
Dictionary<string, bool> row5 = new Dictionary<string, bool>() { { "Cold", false }, { "Flu", false }, { "Malaria", false } };

Dictionary<Dictionary<string, bool>, double> cpt =
    new Dictionary<Dictionary<string, bool>, double>();

cpt.Add(row1, 0.95);
cpt.Add(row2, 0.94);
cpt.Add(row3, 0.29);
cpt.Add(row4, 0.001);
cpt.Add(row5, 0.0);

double prob = cpt[row1]; //work

Dictionary<string, bool> row6 = new Dictionary<string, bool>() { { "E", true }, { "B", true } };

prob = cpt[row6]; // does not work

Console.WriteLine("Prop = {0}", prob);

However, when the order of the (key-value) pairs in the key-dictionary is changed (for example, row1 and row6), an exception is thrown:

The given key 'System.Collections.Generic.Dictionary`2[System.String,System.Boolean' 
was not present in the dictionary

I would like to provide the possibility of an arbitrary sequence of key-value pairs in keys-dictionaries.

Artem
  • 71
  • 3
  • 1
    There is no builtin collection for this, so you need to build your own. You should think about what specifications your collection should have before you start, and possibly even write unit tests before starting to implement it, to find possible usability problems before you start implementing. – JonasH Aug 23 '23 at 09:35
  • That link confused me. Which chapter should I refer to? How can I read it? – shingo Aug 23 '23 at 09:49
  • 1
    _"The number of key-value pairs can be arbitrary"_ If the number is not very large, you can use bit flags, otherwise you can use a sorted collection with a custom comparer. – shingo Aug 23 '23 at 09:58
  • I will think about custom comparer. Thanks! – Artem Aug 23 '23 at 14:44

5 Answers5

2

You could consider using a Dictionary<K,V> having value tuples as keys. For example:

Dictionary<(bool, bool), double> cpt = new();

(bool, bool) tt = (true, true);
(bool, bool) tf = (true, false);
(bool, bool) ft = (false, true);
(bool, bool) ff = (false, false);

cpt.Add(tt, 0.95);
cpt.Add(tf, 0.94);
cpt.Add(ft, 0.29);
cpt.Add(ff, 0.001);

double prob = cpt[tt];

The notation (bool, bool) is a shorthand for the type ValueTuple<bool, bool>.

The arity of the key must be predefined. If you want to store both tt and ttt in the same dictionary, you will have to switch to the ITuple interface as the key, with loss of performance.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • The number of input parameters (bool type) in cpt can be arbitrary (not just 2) – Artem Aug 23 '23 at 10:38
  • 1
    Just an idea: If the number of parameters can vary, why not creating an individual dictionary for each number or can the amount of parameters be higher than 16? These individual dictionaries could be _hidden_ behind an interface / service that defines all methods for each amount of parameters. – Oliver Aug 23 '23 at 10:54
  • @Oliver - You'd still need to manually create the keys. I don't think hiding behind an interface helps. I'm keen to see an implementation. – Enigmativity Aug 24 '23 at 05:10
  • @Enigmativity Just tried something, but it works a little different. At the end you formalize your different cases into a _magic strings_ and then you have the abilitiy to use a _Dictionary_. – Oliver Aug 24 '23 at 10:34
  • @Oliver - Generally a bad approach from a performance POV, but for smaller use cases it's fine. – Enigmativity Aug 28 '23 at 00:25
0

Don't know if this would really work or what are the performance impacts, but I would start with the following classes to store the information:

public class Case
{
    public Dictionary<string, bool> Constraints { get; }
        = new Dictionary<string, bool>();

    public double Probability { get; set; }
}

public class Scenario
{
    public List<Case> Cases { get; }
        = new List<Case>();

    public IEnumerable<string> ConstraintsAvailable
        => Cases.SelectMany(c => c.Constraints.Keys).Distinct();

    public double GetProbability(string caseDescription)
    {
        var cases = CaseDescriptionConverter.FromString(caseDescription);
        var caseFound = Cases.Find(c => c.Constraints.SequenceEqual(cases))
            ?? throw new InvalidOperationException("Case not found");

        return caseFound.Probability;
    }
}

Additionally a converter class to easier describe the scenario I'd like to request:

public static class CaseDescriptionConverter
{
    public static string ToString(params (string constraint, bool value)[] cases)
    {
        return string.Join(';', cases
            .OrderBy(c => c.constraint)
            .Select(c => $"{c.constraint}={c.value}"));
    }

    public static string ToString(IEnumerable<KeyValuePair<string, bool>> cases)
    {
        return string.Join(';', cases
            .OrderBy(c => c.Key)
            .Select(c => $"{c.Key}={c.Value}"));
    }

    public static IEnumerable<KeyValuePair<string, bool>> FromString(string description)
    {
        return description.Split(';')
            .Select(c =>
            {
                var parts = c.Split('=');
                return new KeyValuePair<string, bool>(parts[0], bool.Parse(parts[1]));
            })
            .OrderBy(c => c.Key);
    }
}

With this available I could build up a scenario with different cases:

var scenario = new Scenario();
scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["B"] = true,
        ["E"] = true,
    },
    Probability = 0.95
});
scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["B"] = true,
        ["E"] = false,
    },
    Probability = 0.94
});
scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["B"] = false,
        ["E"] = true,
    },
    Probability = 0.29
});
scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["B"] = false,
        ["E"] = false,
    },
    Probability = 0.001
});

scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["Cold"] = false,
        ["Flu"] = false,
        ["Malaria"] = false,
    },
    Probability = 0
});
scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["Cold"] = false,
        ["Flu"] = false,
        ["Malaria"] = true,
    },
    Probability = 0.9
});

scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["Cold"] = false,
        ["Flu"] = true,
        ["Malaria"] = false,
    },
    Probability = 0.8
});

scenario.Cases.Add(new Case
{
    Constraints =
    {
        ["Cold"] = false,
        ["Flu"] = true,
        ["Malaria"] = true,
    },
    Probability = 0.98
});

And check for a case with this kind of code:

var caseDescription = CaseDescriptionConverter.ToString(("B", true), ("E", true));
var probability = scenario.GetProbability(caseDescription);

Console.WriteLine($"{caseDescription}: {probability}");

Or these lines:

var caseDescription = CaseDescriptionConverter.ToString(("Cold", false), ("Flu", true), ("Malaria", true));
var probability = scenario.GetProbability(caseDescription);

Console.WriteLine($"{caseDescription}: {probability}");

Another optimization would be to memoize the description of a case into a string and having a Dictionary<string, double> where you directly look up the probabiltiy for the given case.

Oliver
  • 43,366
  • 8
  • 94
  • 151
0

What about to use as a key of a dictionary a class like this

public class Vector
{
    public object[] Items { get; }

    public Vector(params object[] items)
    {
        this.Items = items;
    }

    public override int GetHashCode()
    {
        var hashCode = 0;
        foreach (var item in Items)
        {
            hashCode ^= item.GetHashCode();
        }
        return hashCode;
    }

    public override bool Equals(object? obj)
    {
        if (obj is not Vector other) return false;
        if (Items.Length != other.Items.Length) return false;

        for (int i = 0; i < Items.Length; i++)
        {
            if (!Items[i].Equals(other.Items[i])) return false;
        }

        return true;
    }
}

So you can use it like this

var tableOne = new Dictionary<Vector, double>
{
    [new Vector(true, true)] = 0.95,
    [new Vector(true, false)] = 0.94,
    [new Vector(false, true)] = 0.29,
    [new Vector(false, false)] = 0.001,
};

var tableTwo = new Dictionary<Vector, double>
{
    [new Vector(false, false, false)] = 0,
    [new Vector(false, false, true)] = 0.9,
    [new Vector(false, true, false)] = 0.8,
    [new Vector(false, true, true)] = 0.98,
};

Console.Write(tableOne[new Vector(false, true)]); // writes 0.29

You could even go like this

var anotherTable = new Dictionary<Vector, double>
{
    [new Vector(false, 1, "a")] = 0,
    [new Vector(true, 2, "b")] = 0.9,
    [new Vector(false, 3, "c")] = 0.8,
    [new Vector(false, 4, "x")] = 0.98,
};
Console.Write(anotherTable[new Vector(true, 2, "b")]); // writes 0.9
0

My own solution (thanks to Shingo and Theodor Zoulias for their advices and all participants of the discussion):

public class ListComparer: IEqualityComparer<Dictionary<string, bool>>
{
    public bool Equals(Dictionary<string, bool>? x, Dictionary<string, bool>? y)
    {
        if (x is null || y is null) return false;
        if (x.Count != y.Count) return false;

        foreach (KeyValuePair<string, bool> elem in x)
        {
            if (!y.ContainsKey(elem.Key)) return false;
            else if (elem.Value != y[elem.Key]) return false;
        }

        return true;
    }

    public int GetHashCode([DisallowNull] Dictionary<string, bool> obj)
    {
        int hash = 0;

        foreach (KeyValuePair<string, bool> elem in obj)
        {
            var _hash = new HashCode();

            _hash.Add(elem.Key + elem.Value);
             hash += _hash.ToHashCode();
        }
        return hash;
    }

}

In Program.cs:

Dictionary<string, bool> row1 = new Dictionary<string, bool>() { { "B", true }, { "E", true } };

ListComparer comparer = new();

Dictionary<Dictionary<string, bool>, double> cpt = new Dictionary<Dictionary<string, bool>, double>(comparer);

cpt.Add(row1, 0.95);

double prob = cpt[row1]; //works
Console.WriteLine("Prob for row1 = {0}", prob);

Dictionary<string, bool> row6 = new Dictionary<string, bool>() { { "E", true }, { "B", true } };

double prob2 = cpt[row6]; // works now!

Console.WriteLine("Prob for row6 = {0}", prob2);

Code tests are given, see https://dotnetfiddle.net/BSqjrB

Artem
  • 71
  • 3
  • Your `ListComparer.GetHashCode` implementation makes the assumption that enumerating two dictionaries that contain the same keys will result on the equal keys enumerated in tandem. This is be no means guaranteed. In fact officially the enumeration order of the `Dictionary` collection is undefined. This incorrect assumption may result in incorrect program behavior. See [this question](https://stackoverflow.com/questions/76699084/get-hashcode-of-a-hashset-based-on-the-items-present-in-the-hashset "Get hashcode of a HashSet based on the items present in the HashSet") for solutions. – Theodor Zoulias Aug 24 '23 at 14:13
  • 1
    Maybe I misunderstood you, but I want to say that keys-dictionaries with an arbitrary keys order (for example { { "Cold", true }, { "Malaria", false }, { "Flu", true } } and { { "Flu", true }, { "Cold", true }, { "Malaria", false } }) are processed correctly by my program and considered the same. It's easy to test. – Artem Aug 24 '23 at 15:17
  • Have you tested that equal keys-dictionaries have equal hashcodes? This is the [basic rule](https://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/ "Guidelines and rules for GetHashCode") to which hashcodes **must** comply. – Theodor Zoulias Aug 24 '23 at 15:26
  • 1
    Yes, comparer.GetHashCode(dict1) and comparer.GetHashCode(dict2) are equals for keys-dictionaries fom my previous comment – Artem Aug 24 '23 at 15:36
  • I [tested your code](https://dotnetfiddle.net/7I8e4w) and you are right, the `ListComparer.GetHashCode` implementation is not sensitive to the enumeration order. This happens because the hashcodes of the keys and values are combined by simply adding them, and addition is transitive. It's probably a weak method of combining hashcodes though, regarding the quality of the combined hashcode. I am not seeing anyone suggesting this method in [this relevant question](https://stackoverflow.com/questions/1646807/quick-and-simple-hash-code-combinations "Quick and Simple Hash Code Combinations"). – Theodor Zoulias Aug 24 '23 at 15:57
  • 1
    Thanks you for your spending time, I will think about it – Artem Aug 24 '23 at 16:05
  • Actually the correct property of the addition is [commutative](https://en.wikipedia.org/wiki/Commutative_property), not [transitive](https://en.wikipedia.org/wiki/Transitive_relation) as I wrote earlier. – Theodor Zoulias Aug 24 '23 at 20:18
  • Doubling the hashcode (`_hash.ToHashCode() * 2`) is slightly detrimental. You just lose one of the 32 bits of entropy. Tripling the hashcode (`_hash.ToHashCode() * 3`) is neutral. You lose nothing and gain nothing. These changes don't improve the quality of the combined hashcode. – Theodor Zoulias Aug 25 '23 at 08:01
  • Interesting related question in Code Review: [Calculating GetHashCode efficiently with unordered list](https://codereview.stackexchange.com/questions/32024/calculating-gethashcode-efficiently-with-unordered-list). – Theodor Zoulias Aug 25 '23 at 09:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/255051/discussion-between-artem-and-theodor-zoulias). – Artem Aug 25 '23 at 12:48
0

I'd consider using indexer properties for this:

public R this[T index] { get; set; }

Try these classes:

public class Table<T, R> where R : new()
{
    private Dictionary<T, R> _dictionary = new Dictionary<T, R>();
    public R this[T index]
    {
        get =>
            _dictionary.TryGetValue(index, out R r)
            ? r
            : _dictionary[index] = new R();
        set { _dictionary[index] = value; }
    }
}

public static class Table
{
    public static Table<T, R> Create<T, R>() where R : new() => new Table<T, R>();
    public static Table<T1, Table<T2, R>> Create<T1, T2, R>() where R : new() => new Table<T1, Table<T2, R>>();
    public static Table<T1, Table<T2, Table<T3, R>>> Create<T1, T2, T3, R>() where R : new() => new Table<T1, Table<T2, Table<T3, R>>>();
    public static Table<T1, Table<T2, Table<T3, Table<T4, R>>>> Create<T1, T2, T3, T4, R>() where R : new() => new Table<T1, Table<T2, Table<T3, Table<T4, R>>>>();
}

Now you can use it like this:

var table = Table.Create<bool, bool, bool, double>();
table[true][false][false] = 0.0;
table[true][false][true] = 0.9;
table[true][true][false] = 0.8;
table[true][true][true] = 0.98;

Console.WriteLine(table[true][true][false]);

That outputs 0.8 as expected.

Or, like this:

public enum B { True, False }
public enum E { True, False }

var table = Table.Create<B, E, double>();
table[B.True][E.True] = 0.95;
table[B.True][E.False] = 0.94;
table[B.False][E.True] = 0.29;
table[B.False][E.False] = 0.001;

Console.WriteLine(table[B.False][E.True]);

That outputs 0.29 as expected.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • *`get => _dictionary.ContainsKey(index) ? _dictionary[index] : _dictionary[index] = new R();`* -- Doing a double lookup for retrieving a value is painful to watch. Using the [`TryGetValue`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.trygetvalue) is preferable. – Theodor Zoulias Aug 25 '23 at 07:55
  • @TheodorZoulias - It's not much better with `TryGetValue`. I get `get => _dictionary.TryGetValue(index, out R r) ? r : _dictionary[index] = new R();`. Is that what you had in mind? – Enigmativity Aug 25 '23 at 08:11
  • Yep, in case the key already exists (which presumably is the common case), this reduces the lookups to just 1. – Theodor Zoulias Aug 25 '23 at 09:11