How to push and pull from github without sharing sensitive information? Smudge & clean?

Question

When I pull from github to a server repository I want to avoid overwriting localized sensitive information in certain files, for example config.php.

Note: it's not an open-source type repo; I have full control over the repository, I'm the only user, it's private, but critically, it's based on an open-source framework that might change the structure of the config files. I just want to be able to pull from it to test, staging, and production and not accidentally have production's config end up on test, etc. But I can't re-code the config files to pull data from somewhere else without making for tough merging situations later if the framework gets updated.

Ideally I'd want to be able to tell Git, when pulling, during fetching from REPO_URI, always discard any hunks that might change the information presently to be found on line 24 of FILE_PATH. However I gather that is not possible (correct me if I'm wrong).

However unless someone can offer a way to do the above, then please read the below solution and let me know if that seems like the ideal way to do this:

I would use keyword expansion as described in git's user guide here. Below I'll describe how I would do this and then at the bottom ask some questions about this approach.

Description of Method

First I'd write two scripts, "sensitive_values_inserter" and "sensitive_values_remover", that swap certain dummy keywords (that will be in the github repo master) with the particular sensitive information like passwords, usernames, database paths, etc.:

#! /bin/sh -f
sed -e 's/@USERNAME@/dummyvalue/' -e 's/@PASSWORD@/dummyvalue/' $1

etc.

Second I would make three versions of this script, one for each environment: test/staging/production. Each version would contain the specific passwords, usernames, and database paths relevant to the environment it belongs to, instead of the dummy values. I'd place each one of these scripts in a path relative to each of these code repositories, like this:

/live/filters/sensitive_values_inserter
/live/filters/sensitive_values_remover
/live/repo/{LIVE}
/test/filters/sensitive_values_inserter
/test/filters/sensitive_values_remover
/test/repo/{TEST}
/stag/filters/sensitive_values_inserter
/stag/filters/sensitive_values_remover
/stag/repo/{STAG}

Each of these filters would have the specific values for the relevant setups.

Then the entire repo's config would be modified as such:

$ git config filter.infosafe.smudge '../filters/sensitive_values_inserter'
$ git config filter.infosafe.clean '../filters/sensitive_values_remover'

Finally in the server repository do this:

$ echo 'config.php filter=infosafe' >> .gitattributes

That way whenever pulling from the main server, if I understand this correctly, these filters would replace the "dummy" values with the ones I want to use.

Note: to get this to work, as pointed out in this other stackoverflow question, after setting up everything as mentioned above you must:

cd /path/to/your/repo
git stash save
git checkout HEAD -- "$(git rev-parse --show-toplevel)"
git stash pop

In between the checkout and stash pop I had to commit all the changes to the files where the clean operation had taken place. Don't worry, after you commit them, the ones in the working directory get smudged. (It's kind of counter-intuitive, but it works.)

I was able to successfully push to github and only the clean values appear.

(There is an alternate, more advanced technique along these lines that involves using one .gitignore per branch, and two drivers and two filters per branch. This allows for live passwords to be cleaned out when switching to test branch, and vice-versa. The trick is to invoke the cleaners for both branches in the .gitignore of each branch, but only invoke the smudger of the branch that's the home of the .gitignore, so it restores the password of itself. Still in this scenario, when pushing to github all sensitive information remains cleaned out, which is nice. I could go into detail on that if anyone is interested.)

Questions About This Method & Alternatives

I tested this, and it works. But...

Is there a better way to do this using git? I might add that it's not an option to just ignore the files that have the sensitive information in them and it's not an option to ignore changes to them when merging, because I want to be able to pull changes to these files while retaining certain configuration values. That is why I don't want to simply do use git update-index --assume-unchanged FILENAME to permanently ignore future local modifications to the entire files.

Thanks.

For what it's worth, the usual approach (or at least my usual) is not to put the information there in the first place. User name and passwords come out of `~/.netrc` for instance. — torek, Aug 28 '14 at 21:43
If I put the usernames and passwords in ~/.netrc then the software would not have them anymore. — CommaToast, Aug 28 '14 at 23:01
Yes, that's the general idea, if they're not in the software they can't wind up in any commit. The software can retrieve them from those (external, user-supplied) files. — torek, Aug 28 '14 at 23:10
"The software can retrieve them" only if the software is programmed to do so, which in the case of the software I'm talking about, it's not. Programming it to retrieve the values would definitely be one possible solution but it would involve hard-coding a path that would need to be changed depending on if it's in the test server or on live, and having such variations would require different branches for each, which the method I'm proposing (filters) would avoid. — CommaToast, Aug 28 '14 at 23:16
Note, I removed the "<" character from the `sed` line because it was causing the error, "$1: ambiguous redirect". Having removed that, now it's happy. I also added a note about how to add this to a repo. — CommaToast, Aug 28 '14 at 23:21
Yes, in this case you're dealing with software. :-) That's why these are all just comments... Overall your approach looks workable, anyway. — torek, Aug 28 '14 at 23:24
Critically I forgot to mention the site is based on an open-source framework that might change the structure of the config files later. And yes you could easily call this framework because of the fact they made it that way, but I can't change it now. I'm glad I discovered this solution though because it's taught me about a pretty darn powerful aspect of git: filters. — CommaToast, Aug 29 '14 at 01:13
You can default to the path where it's looking now but override it with the value of an environment variable. Do it to upstream's standards - get on the dev mailing list and explain what you're doing and why, they'll tell you exactly how to offer a commit they'll apply. — jthill, Aug 29 '14 at 01:50
I hear what you're saying but I actually like the git filters method better than using environment variables because it's more portable. With environment variables if we move servers that's just one more thing to have to configure on the server. With git filters we can tarball the webroot parent and redeploy it somewhere else without additional setup. — CommaToast, Aug 29 '14 at 20:24

How to push and pull from github without sharing sensitive information? Smudge & clean?

0 Answers0

Linked