0

We need to import Git Repository from one network to another. For security reasions it must pass as text files, that represents readable version of the inner files (all the blobs of the repository, or at least all the diffs between tow blobs). Creating tar from the .git dir isn't good (not readable for the security scans), nor git bundle format.

The promisable candidate is patch files (the security scan know to handle binaries in base64 of patch). It work perfect for linear repo, but we need merging commits as well. We thought about script that'll do in the source: git log -p -m --parents --first-parent > file, and in the dest will parse that file and will:

  • Apply it commit by commit with git am
  • At each branching (second child of one parent) will create branch (with the name branch<hash-of-first-commit>)
  • for each merge will do: git merge <branch-name> --no-commit -s ours, than git apply of the first-parent diff, than git commit ...

After that we just need to rename the branches to the real names, and done. I don't see flaws, but it's not yet implemented.

Questions:

  • Is there existing, working solution (script, tool) for this purpose?
  • If not - is ther any flaws in the process I described that will make it to not reflect exactly the commits with the correct connections between them (We don't have to keep the original commit hashes)?
Yaakov Shoham
  • 10,182
  • 7
  • 37
  • 45

1 Answers1

0

This is the craziest thing I've heard in a while.

The below is completely untested. Use at your own risk.

See the link here, for now I am just going to use git rev-list --objects HEAD as a stand-in for what you would really need to do to get all objects

Getting all objects in a database

Wrap a simple script around getting your objects, figure out what type of object it is, and then store off the object type, the SHA, and the content in text format. You might create 4 directories (blob, commit, tree, tag), and then store a file in each that was the name of the SHA and make the content the text of the sha.

#/bin/sh
for sha in $(git rev-list --objects HEAD | awk '{print $1}')
do
   objType=$(git cat-file -t ${sha})
   git cat-file -p ${sha} > <path>/${objType}/${sha}
done

Theoretically anyway, this will create an all text representation of your objects. Transfer them over, then reverse the process.

for blob files you would use

cat <path>/blob/${sha} | git hash-object -w

for tree you would use

cat <path>/tree/${sha} | git mktree

for commit - I don't know of a low level command to directly read in git cat-file -p output. The format is regular so shouldn't be difficult. Extract from the following

tree ca1fdb46ebf7f2c5dece94f21cec3385e22fb6dd 
parent d685ab16fe1ecf67b77b36bda711d8189f3a54a3 
author Andrew <andrew@nowhere.com> 1411616364 -0700 
committer Andrew <andrew@nowhere.com> 1411616364 -0700

commit message

then do

GIT_AUTHOR_NAME= Andrew GIT_AUTHOR_EMAIL=andrew@nowhere.come (etc. for dates and commtiter info) git commit-tree $(TREE_SHA) -p $(PARENT SHA) [ more parents if needed ] -m "MESSAGE"

Parse all that and pass to git commit-tree unless you have something better.

For tags you should be able to do

cat <path>/tag/${sha} | git mktag > .git/refs/tags/${TAG_NAME}

Then in addition to that you'd need to pass over all your refs and whatnot, which should be text files already so that's a direct copy.

Good luck. ~A

Community
  • 1
  • 1
Andrew C
  • 13,845
  • 6
  • 50
  • 57