I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences as:
'My name is slim shade and I am an aspiring AI Engineer',
'I am an aspiring AI Engineer',
'My name is Slim'
SO what will these 3 arguments do? What I think is as follows:
max_length=5
will keep all the sentences as of length 5 strictlypadding=max_length
will add a padding of 1 to the third sentencetruncate=True
will truncate the first and second sentence so that their length will be strictly 5.
Please correct me if I am wrong.
Below is my code which I have used.
! pip install transformers==3.5.1
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
tokens = tokenizer.batch_encode_plus(text,max_length=5,padding='max_length', truncation=True)
text_seq = torch.tensor(tokens['input_ids'])
text_mask = torch.tensor(tokens['attention_mask'])