I'm trying to load a huggingface
model and tokenizer. This normally works really easily (I've done it with a dozen models):
from transformers import pipeline, BertForMaskedLM, BertForMaskedLM, AutoTokenizer, RobertaForMaskedLM, AlbertForMaskedLM, ElectraForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = BertForMaskedLM.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
But for some reason I'm getting an error when I'm trying to load this one:
tokenizer = AutoTokenizer.from_pretrained("sultan/BioM-ALBERT-xxlarge", use_fast=False)
model = AlbertForMaskedLM.from_pretrained("sultan/BioM-ALBERT-xxlarge")
tokenizer.vocab
I found this question related, but it seems like this was an issue in the git repo itself and not on huggingface
. I checked the actual repo where this model is saved on huggingface (link) and it clearly has a vocab file (PubMD-30k-clean.vocab
) like the rest of the models I loaded.