Best way to store large csv files with a git project

Question

Hey I have another question regarding this one: Why not to use git for large files?. One user in the comment section mentions that git LFS is not really made to store large csv files. So now my question is, what would be a better way? Should I maybe just use git? But what about the problems with large files in history then?

S3 or another cloud storage provider is an option. You can save "master" as `some-file-name.csv` and when updating it, change the name of the previous master to `some-file-name_timestamp.csv` and save the newest version as your "master" file. Depending on how large the file is and how often it changes this may not be feasible though. — bcr, Aug 15 '17 at 13:30
The file is "only" about one gigabyte in size and does not change to often. So do you mean I can set it up in a way that the S3 storage is connected to git? — Gring, Aug 15 '17 at 13:41

score 0 · Answer 1 · answered Aug 15 '17 at 15:14

For large CVS file, you can replicate what Git LFS does, and add your own smudge/clean content filter driver

The smudge script would, on checkout, fetch your CVS file from an external storage (for instance S3, as mentioned in the comments).

The clean script would, on commit, check if the file has changed, and upload it back, which should not happen often according to you.

That way, you avoid keeping a large text file in your Git repo.

Best way to store large csv files with a git project

1 Answers1