1

I train the t5 transformer which is based on tensorflow at the following link:

https://github.com/google-research/text-to-text-transfer-transformer

Here is a sample (input, output):

input:

b'[atomic]:<subject>PersonX plays a ___ in the war</subject><relation>oReact</relation>'

output:

<object>none</object>

However, for the prediction I get:

 ⁇ object>none ⁇ /object>

which replaces < with ??, what should I do to resolve this problem?

Update: I found that strangely < is out of vocabulary for t5 tokenizer, which is sentencepiece, I just don't know how to add it

Innat
  • 16,113
  • 6
  • 53
  • 101
Ahmad
  • 8,811
  • 11
  • 76
  • 141
  • how about using regular expression? – Innat Apr 21 '21 at 14:33
  • @M.Innat i found that `<` is out of vocabulary for `t5` tokenzier, I just don't know how to add it – Ahmad Apr 21 '21 at 14:39
  • [Any](https://stackoverflow.com/questions/60068129/transformers-pretrainedtokenizer-add-tokens-functionality) good. – Innat Apr 21 '21 at 14:44
  • @M.Innat Thank you, however, I now added that I don't use Huggingface, but the T5 directly, which uses sentencepiece – Ahmad Apr 21 '21 at 14:51
  • I see. I think you should also update your title with more relevant words. – Innat Apr 21 '21 at 14:57
  • perhaps related: https://stackoverflow.com/questions/73322462/how-to-add-all-standard-special-tokens-to-my-hugging-face-tokenizer-and-model? – Charlie Parker Aug 11 '22 at 15:19

1 Answers1

1

To my knowledge, you can add new tokens using the Tokenizer.add_tokens(). More details can be found at huggingface here

Arij Aladel
  • 356
  • 1
  • 3
  • 10
  • do you know how to do this: https://stackoverflow.com/questions/73322462/how-to-add-all-standard-special-tokens-to-my-hugging-face-tokenizer-and-model? – Charlie Parker Aug 11 '22 at 15:20