Let's say I have a dataframe with 4 variable. I want to see if I can generate a posterior of gamma mixtures over all the variables, with the goal to find clusters for each observation. I'm guessing I will need some sort of multivariate gamma distribution? But how would I go about this?
Here is some pymc3 code as an example with one parameter, looking for a mixture of two gammas (I have chosen arbitrary parameters):
with pm.Model() as m:
p = pm.Dirichlet('p', a = np.ones(2))
alpha = pm.Gamma('means',alpha = 1, beta = 1, shape = 2)
beta = pm.Gamma('means',alpha = 1, beta = 1, shape = 2)
x = pm.Gammma('x', alpha, beta)
comp_dist = pm.Gamma.dist(means, scale, shape = (2,))
like = pm.Mixture('y', w = p,comp_dists = comp_dist, observed = data)
trace = pm.sample(1000)
So my question is, how would I extend this basic example to multiple variables? I assume that I need to define relationships between the variables somehow to encode them in the model? I feel that I understand the basics of mixture modelling, but at the same time feel that I am missing something pretty fundamental.