How to add cosine similarity as score function in sentence similarity using BERT Transformers

Question

I've been working on a project where I want to calculate the similarity between 2 sentences as input to my model (using BERT by HuggingFace Transformers library and Qoura sentence pair dataset from kaggle). I was trying to use my scoring function as torch cosine similarity but for every input I get a value around 0.98, but as far as cosine similarity output goes, it values comes between -1 to 1. I have included the required code below :

class SentencePairSimilarityBert(nn.Module):
    def __init__(self):
        super(SentencePairSimilarityBert, self).__init__()
        self.bert1 = transformers.BertModel.from_pretrained('bert-base-uncased')
        self.bert2 = transformers.BertModel.from_pretrained('bert-base-uncased')
        self.cosine_similarity = nn.CosineSimilarity(dim=1, eps=1e-6)

    def forward(self, q1_input_ids, q1_attention_mask, q2_input_ids, q2_attention_mask):
        _, q1_pooled_output = self.bert1(input_ids=q1_input_ids, attention_mask=q1_attention_mask, return_dict=False)
        _, q2_pooled_output = self.bert2(input_ids=q2_input_ids, attention_mask=q2_attention_mask, return_dict=False)
        cosine_distance = self.cosine_similarity(q1_pooled_output, q2_pooled_output)
        return cosine_distance

I'm using loss function as Mean Squared Error loss (MSELoss) because I want to calculate how far I am from my target/real value.

The cosine_distance for a batch of 4 comes out to be : [0.982, 0.976, 0.974, 0.948] and targets are : [0, 0, 1, 0]

0 means both sentence are dissimilar and 1 means they are similar.

I'm trying the following training approach but I don't think so I'm going correct because the cosine_distance should be between -1 to 1 and my targets are 0 and 1, so how would I map my targets to this range of cosine_distance in order to calculate the loss and back-propagate.

loss_function = nn.MSELoss()

This is my train function below :

model = model.train()

losses = []

for dictionary in data_loader:
    q1_input_ids = dictionary['q1_input_ids'].to(device)
    q1_attention_mask = dictionary['q1_attention_mask'].to(device)
    q2_input_ids = dictionary['q2_input_ids'].to(device)
    q2_attention_mask = dictionary['q2_attention_mask'].to(device)
    targets = dictionary['targets'].to(device)

results = model(
     q1_input_ids=q1_input_ids,
     q1_attention_mask=q1_attention_mask,
     q2_input_ids=q2_input_ids,
     q2_attention_mask=q2_attention_mask,
)

loss = loss_function(results, targets)
losses.append(loss.item())
loss.backward()
optimizer.step()
scheduler.step()
optimizer.zero_grad()

Any help would be appreciated. Thanks in advance !

Well one solution would be to just scale the targets accordingly: `targets = targets*2 -1` — DerekG, May 04 '21 at 18:02
Are these `[0.982, 0.976, 0.974, 0.948]` the values after finetuning? If not, it is because does not produce meaningful sentence embeddings without finetuning. Please check this [answer](https://stackoverflow.com/a/64237402/6664872) for further details and a better alternative. — cronoik, May 12 '21 at 21:19

How to add cosine similarity as score function in sentence similarity using BERT Transformers

0 Answers0