Say I am using tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
, and all I am doing with that tokenizer during fine-tuning of a new model is the standard tokenizer.encode()
I have seen in most places that people save that tokenizer at the same time that they save their model, but I am unclear on why it's necessary to save since it seems like an out-of-the-box tokenizer that does not get modified in any way during training.