I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to disk?
Asked
Active
Viewed 1,919 times
4 Answers
9
setfit has this class method
model._save_pretrained(save_directory)
and to load it
saved_model = SetFitModel._from_pretrained(save_directory)

user18610139
- 106
- 2
1
I think you can do this with either pickle or joblib
import pickle
import joblib
pickle.dump(trainer, open('model.pkl', 'wb'))
joblib.dump(trainer, 'model.joblib')
And load in the future with:
job_model = joblib.load('model.joblib')
pkl_model = pickle.load(open('model.pkl', 'rb'))

innit
- 51
- 6
1
As an alternative to pushing your Trainer to the Hub as described in SetFit for Text Classification, you can save your trainer locally and use it for prediction.
There is a predict method in the source code. You can use that same method to make predictions from your SetFit object
Save your model locally:
import joblib
# trainer is you SetFit object: setfit.trainer.SetFitTrainer
joblib.dump(trainer, 'my-awesome-setfit-model.joblib')
Load your model and make a classification or inference from your model:
# Load the trainer
trainer = joblib.load('my-awesome-setfit-model.joblib')
# Use the model and predict
trainer.model.predict(["i loved the spiderman movie!", "pineapple on pizza is the worst "])

Stefan
- 23
- 3
0
You can use the sklearn wrapper:
Train the model
from setfit.modeling import SKLearnWrapper
from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression
model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
clf = SKLearnWrapper(model, LogisticRegression())
sentences = ["good", "bad", "very good"]
labels = [1, 0, 1]
clf.fit(sentences, labels)
pred1 = clf.predict(["gooood"])
Save the model
path = "model1"
clf.save(path)
Load the model
clf = SKLearnWrapper(None, None)
clf.load(path)
Test
pred2 = clf.predict(["gooood"])
assert pred1 == pred2

elyase
- 39,479
- 12
- 112
- 119