Background
I've written a program designed to calculate the odds of each outcome associated with a Die Roll, or combinations of Die Rolls. Specifically, when handling rolls like "add together two six-sided dice" (aka the Catan Roll) the way that probabilities are calculated is by saving Mappings between individual outcomes, and the number of possible rolls that can represent that outcome. In this example, the data structure is a TreeMap<Integer, BigInteger>
, which takes this form:
{
[2, 1],
[3, 2],
[4, 3],
[5, 4],
[6, 5],
[7, 6],
[8, 5],
[9, 4],
[10, 3],
[11, 2],
[12, 1]
}
Before anyone asks, the use of BigInteger
in this context is NOT overkill, because it's designed to handle any possible roll that might be presented to the program, and rolls like 100d6 (add together the rolls of 100 six-sided dice) very quickly end up with very large numbers that I don't want approximated as a double
.
As part of the interface for this program, I've decided that I want stats about these rolls to be queryable, and one such stat I want the program to look up is the median of the roll. My current version of the algorithm sums up the trials that represent all the outcomes lower than a given outcome, starting with the lowest outcome, and reports the outcome as the median if the total exceeds 50% of the trials.
This is how I've written the code.
//Is filled with values during object construction
TreeMap<Integer, BigInteger> probabilityMap = new TreeMap<>();
//Memoization to at least make sure we only make this calculation once
private Integer memoizedMedian = null;
public int getMedian() {
if(memoizedMedian == null) {
BigInteger trials = BigInteger.ZERO;
BigInteger totalTrials = numOfTrials();
for(Map.Entry<Integer, BigInteger> entry : probabilityMap.entrySet()) {
//We're guaranteed to be iterating in order, due to how TreeMap's work
trials = trials.add(entry.getValue());
double percentile = trials.doubleValue() / totalTrials.doubleValue();
if(percentile >= 0.5) {
memoizedMedian = entry.getKey();
break;
}
}
//If we didn't find it, someone went wrong with the object initialization
if(memoizedMedian == null)
throw new RuntimeException("Probability Map was not properly Initialized");
}
return memoizedMedian;
}
The Problem
As written, this code works: it finds the median for any set of values I pass to it.
The problem however is that it can be slow: consider this (exaggerated) dataset to see how this might be an issue:
{
[1,1],
[2,1],
[3,1],
[4,1],
[5,1],
[6,1],
[7,1],
[8,1],
[9,1],
[10,11]
}
Obviously, 10 is the median of this dataset, but the algorithm won't figure that out until it scans the very last entry in the map, and for larger, more complicated probability maps, that could take a while to reach.
So I'd like to improve the algorithm to handle these kinds of datasets more responsibly, but I'm not sure what kind of approach to take.
What kind of changes should I make to my algorithm to better calculate the median of this dataset? I'm open to changes to the root data structure as well, but that should have proper justification.