I searched a lot for this but havent still got a clear idea so I hope you can help me out:
I am trying to translate german texts to english! I udes this code:
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-de-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-de-en")
batch = tokenizer(
list(data_bert[:100]),
padding=True,
truncation=True,
max_length=250,
return_tensors="pt")["input_ids"]
results = model(batch)
Which returned me a size error! I fixed this problem (thanks to the community: https://github.com/huggingface/transformers/issues/5480) with switching the last line of code to:
results = model(input_ids = batch,decoder_input_ids=batch)
Now my output looks like a really long array. What is this output precisely? Are these some sort of word embeddings? And if yes: How shall I go on with converting these embeddings to the texts in the english language? Thanks alot!