I've been working on a project where I want to calculate the similarity between 2 sentences as input to my model (using BERT by HuggingFace Transformers library and Qoura sentence pair dataset from kaggle). I was trying to use my scoring function as torch cosine similarity but for every input I get a value around 0.98, but as far as cosine similarity output goes, it values comes between -1 to 1. I have included the required code below :
class SentencePairSimilarityBert(nn.Module):
def __init__(self):
super(SentencePairSimilarityBert, self).__init__()
self.bert1 = transformers.BertModel.from_pretrained('bert-base-uncased')
self.bert2 = transformers.BertModel.from_pretrained('bert-base-uncased')
self.cosine_similarity = nn.CosineSimilarity(dim=1, eps=1e-6)
def forward(self, q1_input_ids, q1_attention_mask, q2_input_ids, q2_attention_mask):
_, q1_pooled_output = self.bert1(input_ids=q1_input_ids, attention_mask=q1_attention_mask, return_dict=False)
_, q2_pooled_output = self.bert2(input_ids=q2_input_ids, attention_mask=q2_attention_mask, return_dict=False)
cosine_distance = self.cosine_similarity(q1_pooled_output, q2_pooled_output)
return cosine_distance
I'm using loss function as Mean Squared Error loss (MSELoss) because I want to calculate how far I am from my target/real value.
The cosine_distance for a batch of 4 comes out to be : [0.982, 0.976, 0.974, 0.948] and targets are : [0, 0, 1, 0]
0 means both sentence are dissimilar and 1 means they are similar.
I'm trying the following training approach but I don't think so I'm going correct because the cosine_distance should be between -1 to 1 and my targets are 0 and 1, so how would I map my targets to this range of cosine_distance in order to calculate the loss and back-propagate.
loss_function = nn.MSELoss()
This is my train function below :
model = model.train()
losses = []
for dictionary in data_loader:
q1_input_ids = dictionary['q1_input_ids'].to(device)
q1_attention_mask = dictionary['q1_attention_mask'].to(device)
q2_input_ids = dictionary['q2_input_ids'].to(device)
q2_attention_mask = dictionary['q2_attention_mask'].to(device)
targets = dictionary['targets'].to(device)
results = model(
q1_input_ids=q1_input_ids,
q1_attention_mask=q1_attention_mask,
q2_input_ids=q2_input_ids,
q2_attention_mask=q2_attention_mask,
)
loss = loss_function(results, targets)
losses.append(loss.item())
loss.backward()
optimizer.step()
scheduler.step()
optimizer.zero_grad()
Any help would be appreciated. Thanks in advance !