1

The title says it all - I want to save a pytorch model in an s3 bucket. What I tried was the following:

import boto3

s3 = boto3.client('s3')
saved_model = model.to_json()
output_model_file = output_folder + "pytorch_model.json"
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=saved_model)

Unfortunately this doesn't work, as .to_json() only works for tensorflow models. Does anyone know how to do it in pytorch?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
spadel
  • 998
  • 2
  • 16
  • 40

4 Answers4

7

Try serializing model to a buffer and write it to S3:

buffer = io.BytesIO()
torch.save(model, buffer)
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())
igrinis
  • 12,398
  • 20
  • 45
  • One question though: What will be the file type of the model in this case and how should I name it (in the example I give I used .json but that is problem wrong here) – spadel Feb 18 '21 at 08:10
  • This would be a binary file containing the `pytorch` model. You can call it "model.bin" or "model.pt", it is not really matters. You can later download the file locally and load it through the `model = torch.load(path_to_local_file)` – igrinis Feb 18 '21 at 09:20
  • Is there a way to directly load model from S3 (without downloading the file locally)? I am developing by connecting from local notebook to remote kernel. I would like to try best to avoid interacting with file system on remote host. – panc Apr 26 '22 at 20:08
  • Sure, just the other way around. Get `boto3` file object, pass it to the model loading function instead of the "regular" file handler/name. See [this question](https://stackoverflow.com/questions/37087203/retrieve-s3-file-as-object-instead-of-downloading-to-absolute-system-path) for example, just use `BytesIO ` instead of `StringIO` – igrinis Apr 27 '22 at 05:25
2
  1. First step it's to serialize your model to the file. There are many ways to do it, with basic PyTorch library you do it with out of box tools:
    #Serialize entire Model to the 
    torch.save(the_model, 'you/path/to/model')
  1. Once you have it on disk, you can then upload to S3.
    s3 = boto3.resource('s3')    
    s3.Bucket('bucketname').upload_file('you/path/to/model', 'folder/sub/path/to/s3key')
  1. Later you can simple download and serialize back to the Model.
    s3 = boto3.resource('s3')   
 
    s3.Bucket('bucketname').download_file(
        'folder/sub/path/to/s3key', 
         'you/path/to/model'
    )

    the_model = torch.load(PATH)
GensaGames
  • 5,538
  • 4
  • 24
  • 53
  • Thanks for the answer - is there any way to save the model in the s3 bucket without saving it locally first? – spadel Feb 15 '21 at 18:00
  • @spadel yes, you can transfer binary directly to S3. The problem over here is your serialization of model to binary, and if it's supported by PyToch exporter. – GensaGames Feb 15 '21 at 20:05
1

To expand a bit on the previous answers: there are two different guidelines in the PyTorch documentation on how to save a model, based on what you want to do with it later when you load it again.

  1. If you want to load the model for inference (i.e., to run predictions), then the documentation recommends using torch.save(model.state_dict(), PATH).
  2. If you want to load the model to resume training then the documentation recommends doing a bit more, so that you can properly resume training:
torch.save({
   'epoch': epoch,
   'model_state_dict': model.state_dict(),
   'optimizer_state_dict': optimizer.state_dict(),
   'loss': loss,
   ...
}, PATH)

In terms of moving those saved models into s3, the modelstore open source library could help you with that. Under the hood, this library is calling those same save() functions, creating a zip archive of the resulting files, and then storing models into a structured prefix in an s3 bucket. In practice, using it would look like this:

from modelstore import ModelStore

modelstore = ModelStore.from_aws_s3(os.environ["AWS_BUCKET_NAME"])

model, optim = train() # Your training code

# The upload function takes a domain string to organise and version your models
model_store.pytorch.upload("my-model-domain", model=model, optimizer=optim)
neal
  • 343
  • 3
  • 10
-1

With PyTorch we use a cloudpickle to serialize and save our model:

# Serialize the model
import cloudpickle
with open(path.join(path_to_generic_model_artifact, "model.pkl"), "wb") as outfile:
    # regressor is an object of a trained model
    cloudpickle.dump(model, outfile)

Deserialize the model:

import pickle
import os
model=pickle.load(open(os.path.join(model_dir, model_file_name), 'rb'))
Hussain Bohra
  • 985
  • 9
  • 15
  • Thanks for your answer - but how exactly is this saving the model in the s3 bucket? – spadel Feb 14 '21 at 15:46
  • cloudpickle.dump will write a model.pkl file (which is binary, we can't read it). You just need to upload pickle file to s3. – Hussain Bohra Feb 15 '21 at 00:51
  • Whoever down voted this answer can you explain why? – Hussain Bohra Feb 16 '21 at 22:08
  • I did not downvote your answer, but I see two reasons why someone did. First and main reason is that your answer is not relevant to the OP's question on saving the model to S3. Second, when you serialize something with `cloudpickle` and deserialize it back with `pickle` you either don't need `cloudpickle` in the first place or you loosing data in the process. – igrinis Feb 18 '21 at 09:35