Matplotlib plot pmf from list of 2D numpy arrays

Question

I have a dataset from my simulations where I combine the results from each simulation seed into a bigger list using bl.extend(df['column'].tolist()).

I'm also running several simulation scenarios, so I append each scenario to a list of lists.

Finally, I'm computing the Probability Mass Function (PMF) of each list as follows (from How to plot a PMF of a sample?)

for idx,sublist in enumerate(pmf_list):
        val, cnt = np.unique(sublist, return_counts=True)
        pmf = cnt / float(len(sublist))
        plot_pmf.append(np.column_stack((val, pmf)))

The issue is that I end up with a list of numpy arrays which I don't know how to plot. The minimum code to reproduce the problem is the following:

import numpy as np

list1 = np.empty([2, 2])
list2 = np.empty([2, 2])
list3 = np.empty([2, 2])
bl = [] # big list
bl.append(list1)
bl.append(list2)
bl.append(list3)
print bl

I can plot using plt.hist(bl[0]) but it doesn't give me the right results. See plot attached for the following list.

<type 'numpy.ndarray'>
[[0.00000000e+00 1.91734780e-01]
 [1.00000000e+00 2.94277080e-02]
 [2.00000000e+00 3.28276369e-01]
 [3.00000000e+00 4.43357154e-01]
 [4.00000000e+00 3.54294582e-03]
 [5.00000000e+00 1.57306794e-03]
 [6.00000000e+00 2.00530733e-03]
 [7.00000000e+00 2.95245485e-05]
 [8.00000000e+00 2.24386568e-05]
 [9.00000000e+00 2.83435665e-05]
 [1.00000000e+01 1.18098194e-06]
 [1.20000000e+01 1.18098194e-06]]

Formatting the y-values I get:

0.1944084241
0.0415880165
0.3480178394
0.4031723062
0.0050902199
0.0033411939
0.0040175705
0.0001480127
0.0001031961
0.0001008373
0.0000058969
0.0000011794
0.0000047175
0.0000005897

very different from the y-values on the histogram plot

How do you want to plot it? What should be your x-axis and y-axis? — harvpan, May 30 '18 at 21:27
What is not right about the histogram? Looks correct to me. I don't understand what it has to do with pmf's though. — Stop harming Monica, May 30 '18 at 21:44
@HarvIpan the x-values are the values on the left, y-values on the right. — , May 30 '18 at 21:48
The code that you have provided, does not produce the array that you have printed. — harvpan, May 30 '18 at 21:55
@John I do not understand what the histogram has to do with it. The histogram is correct, isn't it? — Stop harming Monica, May 30 '18 at 21:59
@Goyo the y-values in the histogram are not correct, plus that orange bar is extra. I've updated the value on my question above. — , May 30 '18 at 23:37

score 0 · Answer 1 · answered May 31 '18 at 06:02

Does the following graph look right?

import matplotlib.pyplot as plt
import numpy as np

X = np.array([[0.00000000e+00, 1.91734780e-01],
              [1.00000000e+00, 2.94277080e-02],
              [2.00000000e+00, 3.28276369e-01],
              [3.00000000e+00, 4.43357154e-01],
              [4.00000000e+00, 3.54294582e-03],
              [5.00000000e+00, 1.57306794e-03],
              [6.00000000e+00, 2.00530733e-03],
              [7.00000000e+00, 2.95245485e-05],
              [8.00000000e+00, 2.24386568e-05],
              [9.00000000e+00, 2.83435665e-05],
              [1.00000000e+01, 1.18098194e-06],
              [1.20000000e+01, 1.18098194e-06],])

plt.bar(x=X[:, 0], height=X[:, 1])
plt.show()

If you already have the first column as the possible values of the random variable, and the second column as the corresponding probability values, you could use a bar plot to visualize the PMF.

The histogram plot function plt.hist is for a vector of observed values. For example,

import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

np.random.seed(0)
plt.hist(np.random.normal(size=1000))
plt.show()

Matplotlib plot pmf from list of 2D numpy arrays

1 Answers1