2

PLEASE NOTE: This questions is different to Squash my last X commits together using Git as we are not looking to squash the last X commits together into a single commit - instead we are looking to consolidate the initial commits into a single commit and retain the last four commits / code state in an automated way (without having to manually pick commits).

We are using git to backup / record changes to a file datastore using an internally hosted GitLab server with a repository that contains some very large files.

We would like to consolidate earlier redundant commits that are no longer required to reduce the size of the repo, but keep the current code state and the last four commits as backups in case we need to restore our datastore to a previous commit.

Which commands are recommended within an automated script that would change the following git history:

0    aabbcc Initial commit
1    aabbdd First backup
2    aabbee Second backup
3    aabbff Third backup
4    aabbgg Fourth backup
5    aabbhh Fifth backup
6    aabbii Sixth backup
7    aabbjj Seventh backup (current code state)

To become the following without loosing the current code state:

4    aabbgg Initial commit -> Fourth backup (consolidated)
5    aabbhh Fifth backup
6    aabbii Sixth backup
7    aabbjj Seventh backup (current code state)
Steve
  • 389
  • 1
  • 2
  • 15
  • Why do you think you can remove the top 4 commits without changing the code state? That's not possible, most likely. – Tim Biegeleisen Jul 24 '18 at 11:28
  • @TimBiegeleisen we need to consolidate all previous commits into a single commit but keep last four commits.. is this not possible with git? – Steve Jul 24 '18 at 12:01
  • @TimBiegeleisen the question is different to the one marked as duplicate as we want to consolidate our initial commits to a single commit and keep the last four commits in an automated, whereas the proposed duplicate question's answers explain how to squash the last commits rather than the initial and requires manual user input. – Steve Jul 24 '18 at 12:34
  • 1
    `git checkout aabbgg && git reset --soft aabbcc^ && git commit -m "Initial commit -> Fourth backup (consolidated)" && git tag new_base && git checkout -b consolidated master && git rebase --onto new_base aabbgg` – sergej Jul 24 '18 at 12:34
  • @Steve The duplicate link I previously mentioned or Serge's soft reset approach are basically the two ways to squash commits. But, both may require manual intervention in general. – Tim Biegeleisen Jul 24 '18 at 12:55

2 Answers2

2

Maybe something like this.

First, squash the older commits:

git checkout aabbgg         # checkout the commit that you want to squash the older commits into
git reset --soft aabbcc^    # squash the commits...
git commit -m "Initial commit -> Fourth backup (consolidated)"
git tag new_base            # tag it, we will use the tag later

Next, rebase the newer commits onto the new base:

git checkout -b consolidated current_code_state
git rebase --onto new_base aabbgg

Note: To avoid rebase to stop at a conflict, you might need to specify a merge strategy, for example:

-s recursive -X ours -X no-renames

Result:

x - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 (current_code_state)
 \
  4' (new_base) - 5' - 6' - 7' (consolidated)

There should be no difference between current_code_state and consolidated.

Finally, if everything went well, delete the original branch and the previously created tag.

git branch -D current_code_state
git tag -d new_base
sergej
  • 17,147
  • 6
  • 52
  • 89
  • is the ^ character essential? Having issues running the following command: `git reset --soft aabbcc^` But can run successfully by omitting the ^. Have almost finished writing the automated script based on your answer - will post asap. Many thanks – Steve Jul 24 '18 at 14:45
  • 1
    `^` means "its parent". However, if `aabbcc` is the very first commit, it does not have a parent. – sergej Jul 24 '18 at 14:56
1

Based on sergej's answer (Git: Preserving current code state and last four commits) I have written an automated script that seems to be working as desired:

#!/bin/bash

function gitConsolidation() {

    # Default settings
    numCommitsToKeep=4
    branchName="master"
    path="/home/steve/test/testgit"

    # Set working directory
    cd $path

    # Get git repo name
    gitRepoName=$(basename `git rev-parse --show-toplevel`)

    # Print default message
    echo -e "** Prepairing to consolidate current Git Repo: $gitRepoName **"
    echo -e "Branch: $branchName"
    echo -e "Path: $path"
    echo -e "Total past commits to keep: $numCommitsToKeep\n"

    # Get required branch
    git checkout $branchName

    # Get size before consolidation
    echo -e "Repo size before consolidation: $(du -hs)" 

    # Print current log list
    echo -e "\n* Git commits prior to consolidation *"  
    git log --pretty="%H - %s"

    # Get initial commit hash
    initialCommitHash=$(git rev-list --max-parents=0 HEAD)
    echo -e "\n* Found initial commit hash: $initialCommitHash *"

    # Get hash for commit to be consolidated with intiial commit
    consCommitHash=$(git log --format=%H | head -$numCommitsToKeep | tail     -1)
    echo -e "* Found hash for commit to consolidate with initial commit: $consCommitHash *"

    # Get hash for latest commit
    latestCommitHash=$(git log --format=%H | head -1)
    echo -e "* Found hash for latest commit $latestCommitHash *\n"

    # Begin consolidation
    echo -e "* BEGIN: Git repo consolidation *"

    # Checkout commit to consolidate with initial commit
    git checkout $consCommitHash

    # Soft reset initial commit
    git reset --soft $initialCommitHash

    # Commit changes
    git commit -m "Consolidated commit $initialCommitHash -> $consCommitHash"

    # Set tag
    git tag new_base

    # Checkout
    git checkout -b consolidated $latestCommitHash

    # Rebase
    git rebase --onto new_base $consCommitHash

    # Get size after consolidation
    echo -e "Repo size after consolidation: $(du -hs)"

    # Print current log list
    echo -e "\n * Git commits after consolidation *"    
    git log --pretty="%H - %s"

    echo -e "\n* END: Git repo consolidation *"
}

# Call function
gitConsolidation

Have successfully run on a local test repo.. about to test on a copy of our massive repo to see if works as desired!

Steve
  • 389
  • 1
  • 2
  • 15
  • 1
    While you *can* make Git do this (your script looks workable), you should note that what you are doing amounts to using a source code management system as a backup system. SCMs are not designed as backup systems (nor vice versa) which is why it's relatively painful like this. A well designed backup system will make it very clean and simple to keep N backups (for any suitable N). I have a few more notes on backups versus SCMs in the first chapter of my long-stalled [book](http://web.torek.net/torek/tmp/book.pdf). – torek Jul 24 '18 at 16:49
  • @torek agreed - unfortunately this was a requirement by the client due to their development workflow - which included many large data files that are dependent on the application code and must be in sync with the versioning of the application code. There is also an independent backup system being used to backup the environment to Veeam snapshots / tape drives ;) – Steve Jul 24 '18 at 16:59