1

I have a local git repository in which I worked for some time using my real name and my personal email address. Nobody else has ever worked on it and it has never been pushed to a remote.

Now I want to push this repository to a public remote to allow some other people to work on it, but before doing so I need to completely remove all my sensitive info replacing them with other ones, let's say:

  • Name: John Doe → Ntakwetet
  • Email: johndoe@gmail.com → user1234@hiddenmail.com

How can I do that in a way that completely removes the original data making impossible to retrieve them in any way from the uploaded repository?

I found some guides suggesting git filter-branch and it seems to work, but I also read (unluckily I can't find the pages anymore) that it leaves traces in some log files and I need to make my data completely unrecoverable.

Please consider that I'm new to git. Also, if more info are needed, please ask for them.


Update

I built this hypothetical serie of commands:

user@pc:~/mylocalrepo$ git filter-branch --env-filter '
         if test "$GIT_AUTHOR_EMAIL" = "johndoe@gmail.com"
         then
                 GIT_AUTHOR_EMAIL=user1234@hiddenmail.com
         fi
         if test "$GIT_AUTHOR_NAME" = "John Doe"
         then
                 GIT_AUTHOR_NAME=Ntakwetet
         fi
         
         if test "$GIT_COMMITTER_NAME" = "John Doe"
         then
                 GIT_COMMITTER_NAME=Ntakwetet
         fi
         if test "$GIT_COMMITTER_EMAIL" = "johndoe@gmail.com"
         then
                 GIT_COMMITTER_EMAIL=user1234@hiddenmail.com
         fi
' -- --all

user@pc:~/mylocalrepo$ git remote add origin https://github.com/ntakwetet/remoterepo
user@pc:~/mylocalrepo$ git push origin --all

Does it do what I need?

Ntakwetet
  • 253
  • 2
  • 12
  • Assuming that commits are _pushed_ to this public repo (such that any residue orphan commits can be ignored), then to ensure the 'information' is not exposed: _do not push any commit which is a **descendant** of any of the commits with oopsie information_, even if the information was reverted / changed in a later commit. Performing any required history rewriting (including any information in commit messages) _before_ pushing any commits accomplishes this. – user2864740 Aug 25 '20 at 23:51
  • @user2864740 I understood the theory, but I'm new to git and I'm not sure about how to apply it. Would my hypothesis (see question update) work? – Ntakwetet Aug 26 '20 at 13:53
  • 1
    This has been [asked and answered](https://stackoverflow.com/search?q=remove+password+%5Bgit%5D) before. The correct answer is long and tedious, so no one's going to repost the best answer. https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history https://stackoverflow.com/questions/52339376/remove-password-from-git – jpaugh Aug 26 '20 at 14:07
  • Does this answer your question? [Remove sensitive files and their commits from Git history](https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history) – jpaugh Aug 26 '20 at 14:08
  • @jpaugh The link you provided don't answer my question. [The first](https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history) is about removing sensitive data _stored in files_ and [the second](https://stackoverflow.com/questions/52339376/remove-password-from-git) is about preventing git from automatically selecting the committer. What I want is removing sensitive data _from all git history_. – Ntakwetet Aug 26 '20 at 16:20
  • @Ntakwetet Exactly where do you expect the sensitive information to live, if not in files? It could live in file contents, file names, or commit messages. There are SO posts about cleaning up all of those. – jpaugh Aug 26 '20 at 17:34
  • @jpaugh I didn't explain it it well. The answer you suggested is about removing sensitive data from files that are part of the project. What I need is to remove/edit data from commits details and git's history and backups, that are stored in git's own files (within `.git` directory). Or at least that's what I understood. I'm new to git, so I might be missing something. – Ntakwetet Aug 28 '20 at 21:23
  • 1
    @Ntakwetet Okay. Here's how I understand it. The history consists of file names (tree objects), file contents (blobs) and commits, all at different points in time. Removing data from the history of a file (for example) means removing the info from that file *right now*, and also repeating for every commit that contains that file. – jpaugh Sep 03 '20 at 20:54
  • 1
    I don't remember the links I sent you, but there are definitely answers on [SO] that teach you how to alter the history. `git filter-branch` or `git rebase` are likely to be involved. Once you have your history the way you want it, you can use [git gc --prune](https://git-scm.com/docs/git-gc#Documentation/git-gc.txt---pruneltdategt) to remove the old blobs --- but be absolutely sure you got everything right before you do. – jpaugh Sep 03 '20 at 20:57

1 Answers1

0

If you haven't pushed anything yet, then cleaning locally is enough.

Do make sure that you have cleansed all of the history you are pushing, though.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • I understood the theory, but I'm new to git and I'm not sure about how to apply it. Would my hypothesis (see question update) work? – Ntakwetet Aug 26 '20 at 13:55