Does anyone have a good method for splitting up PyTorch weights and putting it back together? The reason being is sometimes I need to move my weights between servers and GitHub has a file max capacity size of 100Mb. Since PyTorch weights are essentially dictionary of tensors, I was thinking there must be a way to break up Pytorch weights for a model into less than 100Mb sized files when saving a PyTorch weight as well as recompile it prior to loading. Anyone have a good method for handling this? I unfortunately don't get Git LFS in my research lab so I can't use that sadly.
Asked
Active
Viewed 488 times
0
-
3I would not recommend committing large files. Store your large models in correct services like S3, Google Drive, Dropbox or similar online storages that supports large files. Have a script that uploads and downloads on demand. – Prayson W. Daniel Aug 15 '21 at 11:14
-
@PraysonW.Daniel any good resources on writing scripts to upload and download from Google Drive? – Aug 15 '21 at 11:39
-
You understood how PyTorch weights were saved, and you have your requirements well defined (split in <100Mb chunks). Have you tried splitting the *dict* into multiple weight files with python? If you did, please show us your progress... else please try! – Ivan Aug 15 '21 at 12:03
-
1PyDrive: https://github.com/googlearchive/PyDrive. See also https://dev.to/abanand/uploading-downloading-files-from-google-drive-using-python-2pll – Prayson W. Daniel Aug 15 '21 at 12:05
-
1@Ivan I think Prayson's solution makes more sense. I have empirically had issues pushing a lot of small files to GitHub as well, sometimes I'll get some weird error. So splitting up large models might not be the right solution. – Aug 15 '21 at 12:07
-
Please read [slowness on git due to large files](https://stackoverflow.com/questions/57134772/what-operations-become-slow-when-git-repos-become-large-and-why). You want to avoid pushing binaries to git. It will make life hard for you very fast. – Gulzar Aug 15 '21 at 13:45
-
2Another tool you should look at is Data Version Control (DVC) - https://dvc.org/ – Prayson W. Daniel Aug 16 '21 at 08:47