Parametric estimation of a Gaussian Mixture Model

Question

I am trying to train a model to estimate a GMM. However, the means of the GMM are calculated each time based on a mean_placement parameter. I am following the solution provided here, I'll copy and paste the original code:

import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as datasets

import torch
from torch import nn
from torch import optim
import torch.distributions as D

num_layers = 8
weights = torch.ones(8,requires_grad=True)
means = torch.tensor(np.random.randn(8,2),requires_grad=True)
stdevs = torch.tensor(np.abs(np.random.randn(8,2)),requires_grad=True)

parameters = [weights, means, stdevs]
optimizer1 = optim.SGD(parameters, lr=0.001, momentum=0.9)

num_iter = 10001
for i in range(num_iter):
    mix = D.Categorical(weights)
    comp = D.Independent(D.Normal(means,stdevs), 1)
    gmm = D.MixtureSameFamily(mix, comp)

    optimizer1.zero_grad()
    x = torch.randn(5000,2)#this can be an arbitrary x samples
    loss2 = -gmm.log_prob(x).mean()#-densityflow.log_prob(inputs=x).mean()
    loss2.backward()
    optimizer1.step()

    print(i, loss2)

What I would like to do is this:

num_layers = 8
weights = torch.ones(8,requires_grad=True)
means_coef = torch.tensor(10.,requires_grad=True)
means = torch.tensor(torch.dstack([torch.linspace(1,means_coef.detach().item(),8)]*2).squeeze(),requires_grad=True)
stdevs = torch.tensor(np.abs(np.random.randn(8,2)),requires_grad=True)
parameters = [means_coef]
optimizer1 = optim.SGD(parameters, lr=0.001, momentum=0.9)

num_iter = 10001
for i in range(num_iter):
    means = torch.tensor(torch.dstack([torch.linspace(1,means_coef.detach().item(),8)]*2).squeeze(),requires_grad=True)

    mix = D.Categorical(weights)
    comp = D.Independent(D.Normal(means,stdevs), 1)
    gmm = D.MixtureSameFamily(mix, comp)

    optimizer1.zero_grad()
    x = torch.randn(5000,2)#this can be an arbitrary x samples
    loss2 = -gmm.log_prob(x).mean()#-densityflow.log_prob(inputs=x).mean()
    loss2.backward()
    optimizer1.step()

    print(i, means_coef)
    print(means_coef)

However in this case the parameter is not updated and the grad value is always None. Any ideas how to fix this?

What are you trying to update... your optimizer tries to update the means_coef but your computation graph does not depend on it. — D. ACAR, Apr 06 '22 at 11:23
@D.ACAR I am trying to update the means_coef. It controls how spread or concentrated the means are. The means are created based on it so I was hoping by including the creation of the whole gmm in each iteration it would also be included in the computation graph. — Saam, Apr 06 '22 at 11:30
can you change `torch.linspace(1,means_coef.detach().item(),8)]*2)` to `torch.linspace(1,means_coef,8)]*2)` — D. ACAR, Apr 06 '22 at 11:31
detach() returns a copy of tensor with requires_grad = False so the comp. graph becomes independent of means_coef — D. ACAR, Apr 06 '22 at 11:33
Yes I started without the detach, but then I'd get this error: ```linspace(): argument 'end' (position 2) must be Number, not Tensor``` — Saam, Apr 06 '22 at 11:34
I havnt used linspace but it seems its automatically differentiable so why not do the following: instead of means_coef create a tensor by linspace and make its requires_grad = true and update that. you dont use means coef anywhere else. — D. ACAR, Apr 06 '22 at 11:39
The thing is, I need the means to be exactly on the intervals I set according to the means_coef, so the space between them is always equal. If I create the tensor once and pass it as a parameter to the optimizer, they will be all over the place. — Saam, Apr 06 '22 at 11:45
I was checking the source for that. if the elements in the interval are a transformation as it seems to be so then actually you start and end will be updated. — D. ACAR, Apr 06 '22 at 11:47
I'm not sure I follow this last comment. I can think of no other way to dynamically space the means and let the differentiation process just contract or expand the components. — Saam, Apr 06 '22 at 11:52
`final = Tensor(final, requires_grad=True) initial = Tensor(initial, requires_grad=True) out = range(1, n+1) * (final-intial)/n` then only final and initial will be updated — D. ACAR, Apr 06 '22 at 11:57
if you make `initial = Tensor(initial, requires_grad=False)` then only final will be updated. and every other point will be placed with the same space between them. — D. ACAR, Apr 06 '22 at 12:01
I could'nt find the source for linspace but I guess it must be something like what I have written there. — D. ACAR, Apr 06 '22 at 12:03
Ah I understand your point now. Thanks, it would work as a workaround for this specific case, but I need to solve the mean_coef parameter problem because it's not always going to be linear. So let's say I'm working in logspace instead of linspace and I'd want to change the base of the logspace. Then I'd be having the same issue. — Saam, Apr 06 '22 at 12:15
The same still applies instead of the range() put an x there which represents any grid you want then you can scale it by updatable scaling parameters like the final tensor in my example — D. ACAR, Apr 06 '22 at 12:24
Maybe I'm not getting your point. The logscale puts the points on a logarithmic scale. So you'd need both initial, final and a base to be able to generate points. Here you did eliminate the need for linspace by doing the operation and not using it. But it would only change the start/finish points for logspace, not the base. And it will still give the error I put earlier — Saam, Apr 06 '22 at 12:50
Something like this: ``` initial = 0. final = 10. bse = 3. final = torch.tensor(final, requires_grad=True) initial = torch.tensor(initial, requires_grad=True) base = torch.tensor(bse, requires_grad=True) out = torch.range(1, n+1) * (final-initial)/n torch.logspace(0,10,8,base=base,requires_grad = True)``` — Saam, Apr 06 '22 at 12:51

D. ACAR · Accepted Answer · 2022-04-18T14:34:22.797

1

According to your instructions I have re-written your model. If you run it you can see that all the parameters are changing after the model is optimized. I also have provided the graph of the model at the end. You can simply modify the GMM class as you need if you want to make a new one.

import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as datasets

import torch
from torch import nn
from torch import optim
import torch.distributions as D

class GMM(nn.Module):
    
    def __init__(self, weights, base, scale, n_cell=8, shift=0, dim=2):
        super(GMM, self).__init__()
        self.weight = nn.Parameter(weights)
        self.base = nn.Parameter(base)
        self.scale = nn.Parameter(scale)
        self.grid = torch.arange(1, n_cell+1)
        self.shift = shift
        self.n_cell = n_cell
        self.dim = dim
    
    def trsf_grid(self):
        trsf = (
            torch.log(self.scale * self.grid + self.shift) 
            / torch.log(self.base)
            ).reshape(-1, 1)
        return trsf.expand(self.n_cell, self.dim)
    
    def forward(self, x, std):
        means = self.trsf_grid()
        mix = D.Categorical(self.weight)
        comp = D.Independent(D.Normal(means, std), 1)
        gmm = D.MixtureSameFamily(mix, comp)
        return -gmm.log_prob(x).mean()

if __name__ == "__main__":
    weight = torch.ones(8)
    base = torch.tensor(3.)
    scale = torch.tensor(1.)
    stds = torch.tensor(np.abs(np.random.randn(8,2)),requires_grad=False)
    model = GMM(weight, base, scale)
    print(list(model.parameters()))
    
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    for i in range(1000):
        optimizer.zero_grad()
        x = torch.randn(5000,2)
        loss = model(x, stds)
        loss.backward()
        optimizer.step()
        
    print(list(model.parameters()))

In my case It returned the following parameters:

[Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True), Parameter containing:
tensor(3., requires_grad=True), Parameter containing:
tensor(1., requires_grad=True)]

[Parameter containing:
tensor([0.7872, 1.1010, 1.3390, 1.3757, 0.5122, 0.2884, 1.2597, 0.7597],
       requires_grad=True), Parameter containing:
tensor(3.3207, requires_grad=True), Parameter containing:
tensor(0.2814, requires_grad=True)]

which indeed shows that the parameters are updating. Also you can see the computation graph below:

edited Apr 18 '22 at 14:34

answered Apr 06 '22 at 13:37

D. ACAR

290
1
9

Thanks a lot! I did use it but still the same problem persists. See here the code I'm using: https://pastebin.com/VaFu0Bmp. I'm basically using the function you provided, only dividing the results by torch.log(base) which will change the base of the natural logarithm. And it is the parameter I need to differentiate over. – Saam Apr 06 '22 at 15:46
The link does not work can you check it again please. – D. ACAR Apr 06 '22 at 17:56
Weird, it does work for me. Anyway here is the code on two other websites: controlc.com/6f145b0e and rentry.co/6zn8g . Hope one of these works. Thanks again! – Saam Apr 10 '22 at 15:30
I will check it out this evening and get back to you. Sorry for the late response. – D. ACAR Apr 11 '22 at 06:23
so what you want to do is to optimize the base and scale of a logarithmic grid. I am not sure what the rest of the transformations are and I will be assuming that they are correct. Also why do you stack two copies of the same tensor. Are you going to use the same tensor for both dims or you want a new base and scale to be fit for each dimension? – D. ACAR Apr 12 '22 at 06:46
Yes, both base and scale need to be optimized. I'm stacking them because the transformation should be the same for both dimensions. But if I can get the base and scale to optimize for the logspace, then I can easily extend it to multi-dimensional case. – Saam Apr 12 '22 at 07:33
what I mean is if you need one base and two scale factors for a two dimensional case right! – D. ACAR Apr 12 '22 at 07:44
Oh, now I get it. No it's just one set of base/scale and it will be repeated for the different dimensions, or base/scale sets for each dimension. But the major use-case would be just one transformation applied to every dimension – Saam Apr 12 '22 at 07:50
well hopefully my internet provider fixed the network issue... Ill test and post the code in the evening. – D. ACAR Apr 12 '22 at 07:54
Awesome! Looking forward to it – Saam Apr 12 '22 at 07:59
Hey :) did you by any chance have the time to look into this problem? – Saam Apr 18 '22 at 11:46
I am really sorry had a hectic week I am going to do it right now – D. ACAR Apr 18 '22 at 13:39
1

This is amazing, thanks a lot! Really appreciate the time you took to make it work. – Saam Apr 19 '22 at 08:38

Parametric estimation of a Gaussian Mixture Model

1 Answers1

Linked