0

I have a git repo that I did testing on by creating a bot that creates commits for me about 1 1/2 years ago. I was just learning about git and I wanted to look cool by having thousands of commits.

Essentially, what I did was I created a bot that adds a line to a file, adds that, commits it, then pushes it. So, about 54000+ commits are worthless. How would I remove all of those commits? Is this a good idea?

The commits that don't have value, which are the ones that I want to be removed, are in the middle, starting at 0c4068fb3 and ending at 42b8fae4b. So, the legit commits are before 0c4068fb3 and after 42b8fae4b. The commits that don't have value are easily detected. The reason is that when I created my bot, I used all of my commit messages that I had, and I put them in a list, which the bot would select from randomly, and use that for its commit message. So, any commit repeated multiple times is a commit without value. Also, the majority of the commits that don't have value also say first commit, or something like that.

So, here's the link to the commits section where the fake ones start. As you can tell, the commit messages keep repeating themselves.

The actual content inside of the fake commits is just an increased line to a file called bot.txt. So, nothing of any value in those commits.

If not, could you tell me how I can remove all 54000+ commits and just keep the ones that actually have value?

Thanks

  • 1
    Your title says "squash" but the question says "remove". Are you trying to *remove* the commits as if they never happened, or are you trying to squash them all into a single commit that still has the changes? – TTT May 20 '22 at 16:33
  • @TTT IDK. Whichever one, in your opinion is better because I only know git on the bare-bones sort of way. I'll update the title so it won't be misleading. –  May 20 '22 at 16:34
  • 1
    For the "ones that actually have value", where do they fall in the history graph? For example, some good commits before the 54k, some good commits mixed in with the 54k, some good commits after the 54k? – TTT May 20 '22 at 16:36
  • The commits that don't have value are in the middle, starting at `0c4068fb3` and ending at `42b8fae4b`. So all the commits that have value are before `0c4068fb3` and after `42b8fae4b`. –  May 20 '22 at 16:39
  • 1
    Please add more details to your post. What is the difference between the 54k and your _genuine_ commits? How could a script tell the difference? – CyanCoding May 20 '22 at 16:39
  • 1
    To me it looks as if the "artificial" commit messages don't look artificial at all, and rather, these commits could be detected more easily because they only add an empty line to the file `bot.txt`. Maybe the solution to your problem is to just remove the file `bot.txt` from the history of the project? – mkrieger1 May 20 '22 at 16:52
  • How would I go about doing that? –  May 20 '22 at 16:53
  • For example like [this](https://stackoverflow.com/questions/35115585/remove-files-completely-from-git-repository-along-with-its-history) or like [this](https://stackoverflow.com/questions/43762338/how-to-remove-file-from-git-history) – mkrieger1 May 20 '22 at 16:56
  • Added an answer but I now realize your history is not linear. Do any of the good commits also edit `bot.txt`? If no, then git-filter-repo is the answer, which is in the links @mkrieger1 linked to. (I think it's the second or third answer in both links. Don't use filter-branch, filter-repo is far superior.) – TTT May 20 '22 at 17:13

1 Answers1

0

Update: In your particular case, since all of the commits you wish to remove are writing to a file called bot.txt, and none of the commits you wish to keep write to that file, the simplest course of action is to use git-filter-repo to remove that single file from the entire history. Any commits that only touched that file will fall out of the new re-written repo. The result will be a similar repo without those 54K commits.

Previous Answer: Note this answer might still work for you, but as written the answer below is intended to work in the more general sense on a linear history. By adding the option --rebase-merges to the final rebase command below, you may still be able to accomplish your goal on a non-linear history. Note the main difference here is the 54K commits will be squashed into one commit, which, if that includes creating and finally deleting the file in the last commit, would end up creating a single commit that falls out of the repo as well.

Based on some information from the comments:

TTT asked:

For the "ones that actually have value", where do they fall in the history graph? For example, some good commits before the 54k, some good commits mixed in with the 54k, some good commits after the 54k?

And you answered:

The commits that don't have value are in the middle, starting at 0c4068fb3 and ending at 42b8fae4b. So all the commits that have value are before 0c4068fb3 and after 42b8fae4b

If your history is linear this is straight forward using the squash method and I can guarantee you there won't be conflicts.

# Let's assume you are rewriting branch: master

# Back it up for sanity purposes
git branch master-backup master

# create a new branch
git switch -c temp-branch 42b8fae4b # start a branch from last "bad" commit
git reset --soft 0c4068fb3~1 # reset back to commit before first "bad" commit

# Note right now you have only "good" commits on your branch,
#  and all "bad" commit changes are staged, let's make 1 big commit
git commit -m "Squash all automated commits into one"

# now rebase the remaining commits on master
git rebase 42b8fae4b master --onto temp-branch

Note it's also fairly simple to remove the bad commits instead of squashing, but you can't guarantee there won't be conflicts (unless you happen to know there won't be).

TTT
  • 22,611
  • 8
  • 63
  • 69
  • Oh darn. I now see the history is **not** linear... I may need to delete this answer, or perhaps `--rebase-merges` will work on the rebase command. TBD – TTT May 20 '22 at 17:10
  • When I try to create the temporary branch, I keep getting the error: `error: The following untracked working tree files would be overwritten by checkout:`, but when I run `git status`, I get: `Your branch is up to date with 'origin/master'. nothing to commit, working tree clean`. The files that git are from a couple of my submodules, which are my `dotfiles`, `dwm`, and `local`. –  May 20 '22 at 17:39
  • @SingularisArt I think you may have prematurely accepted my answer, since I don't think it's going to work for you as written, which depends on your history being linear, and I can see that yours is not. – TTT May 20 '22 at 17:49
  • @SingularisArt do any "good" commits in your repo write to the `bot.txt` file? (I think we've already established that all of the "bad" commits only write to `bot.txt`.) – TTT May 20 '22 at 17:50
  • No. Only "bad" commits write to the `bot.txt` file. –  May 20 '22 at 18:22
  • @SingularisArt in that case I would recommend using git-filter-repo. I updated the answer to reflect that. One of the steps of using git-filter-repo is making a new clone. When you do that you shouldn't have to deal with the submodules while re-writing your repo. – TTT May 20 '22 at 18:45