0

So I have this class called ProbabilisticEnumerator, it accepts a map of elements and integer-based weights. See the following example:

var myProbabilityTable = new Dictionary<char, uint> {
    { 'A', 1 }, // 14.29%
    { 'B', 2 }, // 28.57%
    { 'C', 4 }, // 57.14%
};

I am trying to figure out an efficient way to invert this so that one can pass float-based values (which might be more natural to some). Here is an example of what I might mean:

var myProbabilityTable = new Dictionary<char, float> {
    { 'A', 0.1429f }, // 1
    { 'B', 0.2857f }, // 2
    { 'C', 0.5714f }, // 4
};

So far the best I have been able to come up with is an approach that takes a minimum of two passes; the first to get the total weight and a second to create the new dictionary. Is there a better way that one could accomplish this? Am I missing any edge cases that might cause things to "blow up?"

My Attempt:

class Program {
    static void Main(string[] args) {
        var floatWeights= new Dictionary<string, float> {
            { "A", 25.0f },
            { "B", 60.0f },
            { "C", 15.0f },
        };
        var integerWeights = ConvertFloatWeightsToInteger(floatWeights);

        foreach(var item in integerWeights) {
            Console.WriteLine(item);
        }
    }

    static Dictionary<T, int> ConvertFloatWeightsToInteger<T>(Dictionary<T, float> elementWeights) {
        var totalWeight = 0f;

        foreach (var weight in elementWeights.Values) {
            if (float.IsNegative(weight) || float.IsSubnormal(weight)) {
                throw new ArgumentOutOfRangeException(actualValue: weight, message: "weight must be a positive normal floating-point number", paramName: nameof(weight));
            }

            totalWeight += weight;
        }

        return elementWeights.ToDictionary(
            keySelector: k => k.Key,
            elementSelector: e => ((int)((e.Value / totalWeight) * 1000000000))
        );
    }
}
Kittoes0124
  • 4,930
  • 3
  • 26
  • 47
  • It would be awesome if you could provide a [mcve] including inputs defined in code **and the expected results for those inputs**. – mjwills Jul 03 '19 at 14:08
  • I don't get why you're multiplying by `1000000000` –  Jul 03 '19 at 14:09
  • Because (int)(0.1234567890) will return 0 otherwise. – Tanveer Badar Jul 03 '19 at 14:17
  • @Amy It seemed like the proper thing to do while testing; converting small floats to int would result in zero without "amplifying" the value. The largest possible integer is 10 digits long and so I chose 1 billion as the multiplier. – Kittoes0124 Jul 03 '19 at 14:19
  • So your code works as expected, and you just seek performance improvements? –  Jul 03 '19 at 14:30
  • This algorithm is O(2n), so you will always need the two passes through the array. Anything else that you find that may look more efficient is hiding this functionality from you. Personally, I would make it more readable by converting the first loop into a Linq statement, but besides that, nothing else. – Miguel Mateo Jul 03 '19 at 14:32
  • @Amy It seems to work, but one isn't necessarily skilled enough at doing maths to be able to prove it; there also might be edge cases that I failed to account for. Any sort of improvement would be nice. – Kittoes0124 Jul 03 '19 at 14:34
  • @MiguelMateo: O(2n) meaning "two passes" is a nonsense. –  Jul 03 '19 at 14:59
  • Why do you worry about having two passes ? A single pass might be more expensive than two independent ones, you don't know. In terms of asymptotic complexity, the number of passes does not matter (provided it is a constant). In this case, two passes are unavoidable because you can't determine the scaling factor without seeing all values once. –  Jul 03 '19 at 15:02
  • Instead of the magic number `1000000000`, I would suggest using `int.MaxValue` for clarity of purpose. – L. Scott Johnson Jul 03 '19 at 15:05
  • @L.ScottJohnson Would that not cause issues with larger floating point values? One hasn't tested but imagines that accidental wrap-around could become an issue if I used `int.MaxValue`. – Kittoes0124 Jul 03 '19 at 15:16
  • @Kittoes0124 Depends on usage. Just going by the code presented, it's not a given. You can use `int.MaxValue / 2` or something similar if it is a problem in other code parts. I note that it also seems the weights are unsigned, so you could use that fact to avoid overflow as well (using an unsigned int with `int.MaxValue` ). – L. Scott Johnson Jul 03 '19 at 15:26
  • @YvesDaoust no, meaning you must have tow passes regardless, one to calculate the total and the other to calculate each new element. If we want to simplify furthermore the end result, a third pass finding the common denominator and a fourth pass dividing by it will be required as well (as it is, if I pass 0.1 and 0.3, I will get 250000 and 750000, instead of 1 and 3 for example). – Miguel Mateo Jul 03 '19 at 15:30
  • @MiguelMateo: I don't deny the two passes (read my comments), I deny the notation O(2n). –  Jul 03 '19 at 18:21
  • @MiguelMateo Could you post an answer that implements `a third pass finding the common denominator and a fourth pass dividing by it`? I'm currently trying to do so but am not exactly sure what you mean when there is more than 2 elements... – Kittoes0124 Jul 03 '19 at 19:26
  • Just curious what is the point of converting floating weights to an integer ? Also, this question is very vague. A float can never fully be converted to an integer, there will always be some amount of error due to floating-point precision errors – DollarAkshay Jul 03 '19 at 20:21
  • @AkshayLAradhya See my edits; hopefully that explains things adequately. – Kittoes0124 Jul 03 '19 at 20:49
  • When error in FP needs to be contained and integers are faster than floats (not usually anymore...) then fixed point math can help. That is basically what you are doing by multiplying with a number to erase the decimals. – Michael Dorgan Jul 03 '19 at 20:58
  • If the only purpose of this is so that this seems more natural to users, then requiring that the passed values have a total weight of 1.0 of 100.0 would be appropriate. – RobertBaron Jul 03 '19 at 21:36
  • @YvesDaoust read this (https://stackoverflow.com/questions/25777714/which-algorithm-is-faster-on-or-o2n), and familiarize yourself with algorithm theory, so you can understand the O notations. – Miguel Mateo Jul 03 '19 at 22:43
  • @Kittoes0124 you need to find the prime denominators for each number, then find the common denominators accross al numbers, then divide each number by this common denominator and the answer is what you need. If I find time later today I will write it, but give it a try. – Miguel Mateo Jul 03 '19 at 22:47
  • so you got floats and want integers with the same ratio? see this: [Converting floating ratios to int](https://stackoverflow.com/a/56822747/2521214) its single pass `O(n)` where `n` is the number of floats to convert ... its doable completely on integer math without rounding if the floats do not have too different exponents. If you want smallest result then the division by GCD of the result is much slower then the conversion itself ...if I see it right then the final complexity would be something like `O(n.log(n))` – Spektre Jul 04 '19 at 05:45
  • PS spending more time on getting smaller values can even speed up your entire App as the `ProbabilisticEnumerator` (what ever that is) could be faster on smaller values but that is just my educated guess ... – Spektre Jul 04 '19 at 05:51
  • @MiguelMateo: thanks for the "lesson". The very first comment reads "There's no such thing as O(2N)", with reason. –  Jul 04 '19 at 06:36
  • @Spektre please note that he is trying to get the ratio of the float numbers in integers, so you will need the sum first, in order to get the ratios later, hence O(2n). To get the smallest result, the complexity is actually O(3n+n*sqr(n)) ... not log(n), but square root instead. – Miguel Mateo Jul 04 '19 at 14:14
  • @MiguelMateo I do not need to sum at all as floats has common denominator already (they are multiple of power of 2 (exponent)) so finding the integer ratio from float is `O(N)` its a simple `max` of the float exponents the rest is `O(1)`. the division by GCD is prime decomposition very similar to SoE which is `O(N.log(log(n)))` but division is applied multiple times hence my educated estimate (might be wrong) `O(N.log(n))` If you add the sqrt ending condition it would be `O(N.log(sqrt(n)))`. Note `N` is number of floats and `n` is `min` value of converted int before GCD division – Spektre Jul 04 '19 at 14:53
  • @MiguelMateo with your approach you are right but that has one serious drawback and that is target integer bit-width limitation ... converting even 32 bit float can lead to big integers and once you start converting them into common denominator you will most likely out of range it before you started ... – Spektre Jul 04 '19 at 14:56
  • 1
    @Spektre that is the original question, assuming there is a collection of floats ... I cannot control those floats, but I am assuming that Kittoes0124 is aware. Please put in the answer just what algorithm you would do that is O(n), without finding GCD, because I do not see it – Miguel Mateo Jul 05 '19 at 00:33
  • 1
    @MiguelMateo I added [edit1] ...its all the same I just added `for` loops instead of hard-coded lines but all equations and code is the same ... Also I added binary example into bullet #5 of original text so its clear what and how its done – Spektre Jul 05 '19 at 08:28
  • @Spektre, if I read correctly, your algorithm does three cycles, from 1 to N, doing three different things in each cycle ... that is my friend an O(3N) algorithm. However this algorithm provides the lowest integer numbers that meat the same ratios as the floats provided, so that should be the answer to this question. – Miguel Mateo Jul 05 '19 at 09:11
  • @MiguelMateo complexity `O(3N)` is the same as `O(N)` only the implicit constant time is different ... complexities describe growth of runtime, space usage, or whatever depending on the input `N` not the actual time ... that you need to measure anyway for a specific `N` and specific environment otherwise its meaningless ... it does not matter if you run 3 times the same loop or just once with slower iteration that is why the constant is discarded from complexity ... – Spektre Jul 05 '19 at 10:20

0 Answers0