13

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be deployed to a device that does not have internet access. Here, it's already been answered, how to save the model Download pre-trained BERT model locally. Yet I'm stuck at loading the saved model from the locally saved path.

When I try to save the model using the above-mentioned technique, these are the output files:

('/bert-base-nli-mean-tokens/tokenizer_config.json',
 '/bert-base-nli-mean-tokens/special_tokens_map.json',
 '/bert-base-nli-mean-tokens/vocab.txt',
 '/bert-base-nli-mean-tokens/added_tokens.json')

When I try to load it in the memory, using

tokenizer = AutoTokenizer.from_pretrained(to_save_path)

I'm getting

Can't load config for '/bert-base-nli-mean-tokens'. Make sure that:

- '/bert-base-nli-mean-tokens' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/bert-base-nli-mean-tokens' is the correct path to a directory containing a config.json 
neha tamore
  • 181
  • 1
  • 1
  • 8

2 Answers2

19

You can download and load the model like this

from sentence_transformers import SentenceTransformer
modelPath = "local/path/to/model

model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
model.save(modelPath)
model = SentenceTransformer(modelPath)

this worked for me.You can check the SBERT documentation for model details for the SentenceTransformer class [Here][1]

[1]: https://www.sbert.net/docs/package_reference/SentenceTransformer.html#:~:text=class,Optional%5Bstr%5D%20%3D%20None)

elotech
  • 301
  • 2
  • 4
3

There are many ways to solve this issue:

  • Assuming you have trained your BERT base model locally (colab/notebook), in order to use it with the Huggingface AutoClass, then the model (along with the tokenizers,vocab.txt,configs,special tokens and tf/pytorch weights) has to be uploaded to Huggingface. The steps to do this is mentioned here. Once it is uploaded, there will be a repository created with your username, and then the model can be accessed as follows:
from transformers import AutoTokenizer
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("<username>/<model-name>")
  • The second way is to use the trained model locally, and this can be done by using pipelines.The following is an example how to use this model trained(&saved) locally for your use-case (giving an example from my locally trained QA model):
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
nlp_QA=pipeline('question-answering',model='./abhilash1910/distilbert-squadv1',tokenizer='./abhilash1910/distilbert-squadv1')
QA_inp={
    'question': 'What is the fund price of Huggingface in NYSE?',
    'context': 'Huggingface Co. has a total fund price of $19.6 million dollars'
}
result=nlp_QA(QA_inp)
result

There are also other ways to resolve this but these might help. Also this list of pretrained models might help.

  • I wish to use pre-trained bert-base-nli-mean-tokens model. The third option is not feasible as I don't have internet access to my local system. I should be able to save it once(downloading from the internet)and onwards, it should be loaded from the system without having any internet access. from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens") tokenizer.save_pretrained(local_path) loaded_tokenizer = AutoTokenizer.from_pretrained(local_path) When I load the model, I get above mentioned error – neha tamore Dec 23 '20 at 09:14
  • I don't understand why everyone just assumes `BERT` and `sentence-transformers` are the same thing ? – theProcrastinator Jan 11 '22 at 15:53