3

We migrate large CVS repositories to GIT with cvs2git. For an in-house tool we need a mapping from CVS revision numbers to GIT revision hash for some files.

cvs2svn has an parameter --cvs-revnums, but this revisions are only stored in svn file properties and are not available for git.

I saw that git cvsimport -R creates this mapping, but has many other disadvantages to cvs2git.

Are there any other possibilities to get the informations from --cvs-revnums ?

SebW
  • 61
  • 3

2 Answers2

3

If I understand correctly, you want a way to answer the question "what is the first Git commit that includes CVS revision X.Y of file FOO?".

If you turn on cvs2git verbose output ("-v"), then cvs2git displays, during CreateRevsPass, the CVS file revisions that are being added to each Git commit:

CVS Revision grouping:
  Time: Fri May 23 02:31:36 2003
Creating Subversion r23 (commit)
 proj/default 1.2.2.1
 proj/sub1/default 1.2.2.1
 proj/sub2/subsubA/default 1.1.2.1

This is close to what you want. But it is not quite enough information to generate your table, because there is no easy way to map the pseudo-Subversion revision numbers (like "r23") to Git commit hashes. In fact, this is not trivial because cvs2git doesn't create the Git hashes itself, but rather just writes them in an abstract form to "git fast-import", which creates the commits and computes their hashes.

Tellya what I'm gonna do...

I just made a change to the trunk version of cvs2svn which causes OutputPass to emit a little bit more information, namely, which "mark" corresponds to which pseudo-Subversion revision number. The output for the above commit looks like this:

Writing commit r23 on Branch('B_MIXED') (mark :1000000021)

The mark ":1000000021", in turn, can be converted into a Git SHA-1 by asking "git fast-import" to write its marks to a file:

cat ../git-blob.dat ../git-dump.dat | git fast-import --export-marks=FILENAME

Look in the resulting file for a line that looks like this:

:1000000021 0aa255270fbb94ad691d5391a6d37c2ee6d78b03

from which you can read off the Git hash.

You still have a bit of work to do to pull all of this information together, but now at least it should in principle be possible.

Please note that this method will only tell you the first Git commit containing the CVS file revision. It will not tell you when that file revision was merged to other branches. And in fact, because of the impedance-mismatch between CVS and Git, you cannot rely on the Git commit ancestry graph to tell you that information. So there would be a lot more work to do to make this into a complete, convenient feature.

Hope that helps.

mhagger
  • 2,687
  • 18
  • 13
1

Thank you for your answer!

Now I finished the Migration from CVS to GIT successfully.

Because we needed the mapping between Git Hash and CVS revision for a couple of files in every repository only, I solved the described problem in a way that was a little bit more easy for me:

  1. Migration of the CVS Repository with cvs2svn from CVS to GIT.
  2. For file A: Receive all CVS commits from all branches from CVS server, ordered chronological in a single list.
  3. Receive all commits from all branches for file A from GIT, ordered chronological (ignore commits with comment "This commit was manufactured by cvs2svn") in a single list.
  4. Ensure that the amount of CVS an GIT commits in both lists is exactly the same (to ensure that nobody made newer commits to CVS ).
  5. Map every CVS revision to a single GIT Hash.

Now we had for every CVS revision the first Git Commit containing the CVS File revision. This worked for us, because we had for one file no CVS commit with the same timestamp in different branches.

SebW
  • 61
  • 3