Advises on the gestion of a website via Git : Images and other big files

Question

I have a website completely gestionned by Git. In a word, I develop my site web on my local place and when I reach a result which deserves to be online, I simply enter git push (without passwd thanks to public key, or any other thing) and 3 seconds later my new website is online.

In fact, I am completely fond of this system and I want to use Git (which I am learning for about 4 days) as much and as smartly as possible. For a lot of things.

For the website topic, what is bothering me is that I need "big" files (= which is not source code) such as images (logo, background, ...) or PDF files (CV, scientific papers). If I understand well, these files are just saved from a version to the next as the "deltas" are not accurate. In fact, the old one are usually useless (who cares about the old back grounnd or the from now on uncomplete CV ?) and are very likely to take a lot of memory (if I update my CV at each event, as an example) with the time going.

So for this project, and for more after, I want to have a git directory with IN IT :
- the files of the project (mostly source code) I want to use as all common git files of a repository, and have a back trace of it with the possibility of backup
- a directory (let's call him data/) which is updated at each commit + push but of which former files are since then definitely lost on the distant git repo

The idea is to :
- avoid the memory explosion
- allow people who makes a pull (even for a former branch) to have by this one fact a working local project, but with the latest data/ (the new logo and new CV on the former website for our study case)
- makes my life simplier and my girl friend happier

All ideas are welcomed, but please notice that it is in my philosophy to prefer high level solutions.

score 1 · Answer 1 · edited May 23 '17 at 12:20

1

That is where an artifact referential like Nexus can be useful.

You manage and deliver your sources from a Git repo
but you store your artifacts (big pictures, documents, ...) in an artifact (even a snapshot one, since you don't keep the history)

The idea is that:

it is easy to clean or delete data from an artifact repo (it is just a collection of shared folders, with a naming convention),
while it is hard to do the same in a DVCS (distributed version control system), designed to retain data over time, and supposedly lean enough to be cloned around (not bloated by large binary data).

edited May 23 '17 at 12:20

Community

1
1

answered Aug 11 '14 at 05:57

VonC

1,262,500
529
4,410
5,250

Ok that is an idea, thank's. Any advice to include it on my git repository so **data** is downloaded at every git pull (and if possible updated at every git push or any script) ? – Nilexys Aug 11 '14 at 13:54
@Nilexys getting artifact means running a maven command, based on a `pom.xml` (text file which is versioned in your git repo). You could add a smudge script (as in http://stackoverflow.com/a/25217391/6309) in order for that script to compare the checked out content of the pom.xml with a private copy: if that content has changed, it runs the maven command (as in http://stackoverflow.com/a/11265280/6309). If no change is detected, it does nothing. – VonC Aug 11 '14 at 14:03
After digging a bit into Maven and Nexus, it appears a bit more complicated than necessary for what I need, considering that I am the only developer on these projects. – Nilexys Aug 11 '14 at 21:16
@Nilexys sure, git-annex might be a better fit, then. – VonC Aug 12 '14 at 05:17

score 1 · Answer 2 · answered Aug 11 '14 at 18:57

1

Another solution (for readers) can be found with git-annex, which is meant to manage heavy contents but :
- need lot more commands and management
- seems to be likely not to allow the removal of files since they are not copied somewhere (not what I wanted)

But as an extension of git, it can be very convenient for a lot of purposes.

answered Aug 11 '14 at 18:57

Nilexys

633
1
5
10

1

Indeed. +1. I mentioned it (with a few others) at http://stackoverflow.com/a/12855986/6309 or http://stackoverflow.com/a/23777122/6309 – VonC Aug 11 '14 at 19:00

score 1 · Accepted Answer · answered Aug 12 '14 at 05:38

My final solution doesn't keep the history (but I could do it later in an annex file) but is dramatically simple and efficient, with the tiny but easy and reliable rsync. I just have my websites on my local computer with the same name as in the server.

Here is the part of my .bashrc I added :

upload() {
local var=`basename $PWD`
rsync -az --del --progress -e ssh ~/www/$var/files/ admin@domain:/var/www/$var/files/
}

deploy() {
upload
git push
}

It is perfect for my personnal use which is a lonely gestion of my web pages.

But I read a lot about Nexus, and I will keep it in mind for bigger project.

For readers : you have a lot of solutions for med-sized project like git-annex git-fat git-media git-annex git-data... You have no more excuses not to handle properly your large files !

Advises on the gestion of a website via Git : Images and other big files

3 Answers3

Linked