1

I have a bunch of fit.objects which produce *.rds files larger than 100MB. GitHub comes with a file size limitation of 100MB and the solution of git lfs doesn't fit me since I don't want to pay for their services.

Background & possible solutions This question is a result of another question. A solution for data.frames is proposed but not all fit.objects can be easily transformed into data.frame objects. See enter link description here

Question : What is the best solution to commit large *.rds files via GitHub without using git lfs?

mugdi
  • 365
  • 5
  • 17
  • 2
    I think this needs to be broken into two questions. As for the second,, I personally don't tend to think of git repos as a great place to store 100+MB data files. Granted, I think they support large *artifacts* (releases), can't be sure. Perhaps you can automate (via github actions or similar) the production of these large files? – r2evans Mar 21 '22 at 15:27
  • Alright I will split the question and link the question of the broom error in this one to help a possible path for a solution! What do you mean by "production of these large files"? Any kind of script which splits large files with 100MB max filesize? – mugdi Mar 21 '22 at 15:37
  • My comment is based solely on my experience with GH repos: I've only seen large-ish files in a repos "Releases" section. I don't know if the GH filesize restrictions apply to releases, I'm just suggesting that as a path of research if you had not already considered it. (I'm not a GH guru, it's just a thought.) Good luck! – r2evans Mar 21 '22 at 15:48
  • 2
    You can use https://dvc.org/ for versioning data. Files be hosted somewhere else (network storage, S3, ...) – danlooo Mar 21 '22 at 15:51
  • 1
    I don't know what sort of "fit.objects" you are talking about, but many fitted models contains copies of all the data and a lot of other stuff that you might not need to save, depending on your use case. [See this blog entry that drastically reduces the size of a GLM fitted model](https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/). – Gregor Thomas Mar 21 '22 at 16:28
  • But I would also strongly reinforce the other comments that suggest that Github isn't an appropriate place to store large files in general. DVC looks cool, but a relatively simple solution is to create an S3 bucket and just stick your models there, include a timestamp or version number in the file name if you need to track version. – Gregor Thomas Mar 21 '22 at 16:29
  • Thanks for the comments. I guess I will follow this advice. Can anyone recommend any free S3 which is not Google? – mugdi Mar 22 '22 at 09:03

0 Answers0