9

How do I create a joint model that shares the parameters of a Knowledge Graph Embedding (KGE) model, TuckER (given below), and GloVe (assume a co-occurrence matrix along with the dimensions is already available) in ?

In other words, the joint model must obey the criterion of the CMTF (Coupled Matrix and Tensor Factorizations) Framework and the weights from the two embeddings must be tied during training. The problem here is that the KGE expects a triple (subject, relation, object) whereas the GloVe expects a co-occurrence matrix. Additionally, their loss functions are also computed differently.

class TuckER(torch.nn.Module):
    def __init__(self, d, d1, d2, **kwargs):
        super(TuckER, self).__init__()

        self.E = torch.nn.Embedding(len(d.entities), d1)
        self.R = torch.nn.Embedding(len(d.relations), d2)
        self.W = torch.nn.Parameter(torch.tensor(np.random.uniform(-1, 1, (d2, d1, d1)), 
                                    dtype=torch.float, device="cuda", requires_grad=True))

        self.input_dropout = torch.nn.Dropout(kwargs["input_dropout"])
        self.hidden_dropout1 = torch.nn.Dropout(kwargs["hidden_dropout1"])
        self.hidden_dropout2 = torch.nn.Dropout(kwargs["hidden_dropout2"])
        self.loss = torch.nn.BCELoss()

        self.bn0 = torch.nn.BatchNorm1d(d1)
        self.bn1 = torch.nn.BatchNorm1d(d1)
        
    def init(self):
        xavier_normal_(self.E.weight.data)
        xavier_normal_(self.R.weight.data)

    def forward(self, e1_idx, r_idx):
        e1 = self.E(e1_idx)
        x = self.bn0(e1)
        x = self.input_dropout(x)
        x = x.view(-1, 1, e1.size(1))

        r = self.R(r_idx)
        W_mat = torch.mm(r, self.W.view(r.size(1), -1))
        W_mat = W_mat.view(-1, e1.size(1), e1.size(1))
        W_mat = self.hidden_dropout1(W_mat)

        x = torch.bmm(x, W_mat) 
        x = x.view(-1, e1.size(1))      
        x = self.bn1(x)
        x = self.hidden_dropout2(x)
        x = torch.mm(x, self.E.weight.transpose(1,0))
        pred = torch.sigmoid(x)
        return pred

I know how to jointly train two pre-trained models by loading the state dicts, taking an instance, running them on the two models, and then applying a feedforward layer on top. But I seem to be not able to figure this scenario out. Can you please suggest how I can achieve this?


Important Resources:

  1. Code for TuckER - https://github.com/ibalazevic/TuckER
Innat
  • 16,113
  • 6
  • 53
  • 101
Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • You can use `model.R` or `model.W` directly. Is that what you mean? – Natthaphon Hongcharoen May 28 '21 at 19:04
  • Sorry, can you elaborate on what you mean? – Nickil Maveli May 28 '21 at 19:13
  • First. What do you even mean by "tied"? It shares the same parameters? Share update the same time? Or something else? – Natthaphon Hongcharoen May 28 '21 at 19:13
  • shares the same parameters at each step – Nickil Maveli May 28 '21 at 19:14
  • Yeah, I really don't understand what you want to do. This looks like an easy problem to me so could explain it in a bit more comprehensible. As I'm not a mathematician. – Natthaphon Hongcharoen May 28 '21 at 19:18
  • To the extend of what I understand. You can assign the weight of a model by using the weight of already exist model, like `self.W = other_model.W`. Is that remotely what you're looking? – Natthaphon Hongcharoen May 28 '21 at 19:22
  • Sort of, but they must be jointly trained at each step. Note that, they are not pre-trained models, but rather the training happens ad-hoc. Do you have a pseudo-code for this purpose? – Nickil Maveli May 28 '21 at 19:34
  • Let's say you already have a `tucker = TuckER()` and `glove = GloVe()`. You can literally set `glove.W1 = tucker.W`. These parameters will be shared and updating one will change another. – Natthaphon Hongcharoen May 28 '21 at 19:38
  • But when you update one model with `optimizer.step()` the gradient on another model will gone. You'll need to make one optimizer for 2 models like `optimizer = Adam(list(tucker.parameters()) + list(glove.parameters())` – Natthaphon Hongcharoen May 28 '21 at 19:49
  • Cool, thanks. I get that. I already have a code that does the training for TuckER - https://pastebin.com/raw/GhGhYnCb. Do you think the same code can do the joint training too? – Nickil Maveli May 28 '21 at 19:59
  • I'm not sure about the CMTF and KGE. But this code should do the job by adding the needed models. You can compute loss individually, just need to update them together. – Natthaphon Hongcharoen May 28 '21 at 20:18
  • BTW since you need to have only one optimizer if you want to share the parameters. If you want different learning rate for different model, you can sort of do that by multiplying the loss of one model. Like, more loss would roughly act like more learning rate. – Natthaphon Hongcharoen May 28 '21 at 20:27

0 Answers0