there are only four dots of the same size
Well, in your specific case both the x- and y-Values contain only [0,1,1,0,..]
, so the bubble_plot()
can only show you bubbles positioned at [0,0], [0,1], [1,0], [1,1]
. The different sizes give you the correlation of columns 'A' and 'B', i.e. the size of the bubble at [1,0]
shows in how many rows there was a 1
in column 'A' and a 0
in column 'B'.
If you add a import matplotlib.pyplot as plt
and plt.colorbar()
, you'll be able to see that the colours mean the same as the sizes:
import pandas as pd
import numpy as np
from bubble_plot.bubble_plot import bubble_plot
import matplotlib.pyplot as plt
np.random.seed(2020)
A = np.random.choice([0,1],size=50)
B = np.random.choice([0,1],size=50)
df = pd.DataFrame({'A':A, 'B':B})
bubble_plot(df, x='A', y='B')
plt.colorbar()
plt.show()
And if you were to use h = plt.hist2d(df['A'], df['B'], bins=2)
instead of the bubble_plot()
, you could use print(h[0])
to get the distribution information:
[[13. 15.]
[14. 8.]]
or, normalised print(h[0]/h[0].sum())
:
[[0.26 0.3 ]
[0.28 0.16]]
i.e. in 16% of the dataset a 1
in df['A']
correlates with a 0
in df['B']
.