Copy file and its entire history

Question

Myself and another developer are developing an API accessed by other code. As we change the behaviours of the API to better suit our needs, we release additional versions of the API without deprecating the old versions so that extant applications using the API will not have to update immediately. For example:

$ cat 0.5.php
<?php
$username = $_POST['name'];
?>

$ cat 0.6.php
<?php
$username = $_POST['username'];
?>

When we start a new version, typically we will cp version N-1.php to N.php and code from there. However, if we do this with Git, then we lose the entire blame, diff, and other histories from the file for comparison and reverting. How can I 'forge' this history of the old file into the new file such that blame, log, diff, and such commands "just work" without presenting to them additional flags or arguments such as --follow?

I might be nitpicking here, but isn't the **point** of Git to handle your versions? So that you don't need to resort to stuff like "change filenames" or "copy history"? — Madara's Ghost, Jun 27 '13 at 12:39
The idea is that multiple versions of the API will be available to users simultaneously. I'm not going to have the API users SSH into my server and checkout a commit as per the version of the API that they happen to need. — dotancohen, Jun 27 '13 at 12:50
That can be done using branches, see the PHP source for example. — Madara's Ghost, Jun 27 '13 at 12:55

John Szakmeister · Answer 1 · 2013-07-24T09:25:30.573

10

You want to use the -C flag. That will detect copies as well as renames, so it can follow the history. diff, blame, and log accept this flag.

Like @madara-uchiha said though: you should look at using tags, and maybe generating your git-x.y files from them instead. You can use something like the following to fetch the contants of a file at a given tag:

git show v0.6:git.php > git-0.6.php

Where v0.6 is the tag you're interested in.

Update:

Here's a small script to do it. This first one assumes your tags are of the form x.y or x.y.z:

#!/bin/bash
versions=$(git tag -l | egrep -o '^[0-9]+\.[0-9]+(\.[0-9]+)?$')
for version in $versions
do
    git show $version:git.php > git-$version.php
done

If you're tags are of the form vX.Y or vX.Y.Z and you want git-x.y.php or git-x.y.z.php as the filename, this works:

#!/bin/bash
versions=$(git tag -l | egrep -o '^v[0-9]+\.[0-9]+(\.[0-9]+)?$')
for version in $versions
do
    git show $version:git.php > git-${version/#v/}.php
done

Run this script as part of your release process, and it will generate all the versions for you. Also, it's pretty easy to drop the git- from the name. For example, > git-$version.php becomes > $version.php.

edited Jul 24 '13 at 09:25

answered Jun 27 '13 at 12:49

John Szakmeister

44,691
9
89
79

For the annotate you could specify the -C option twice or three times, it will look for copies from other files. More info in the `git help annotate`. – kan Jun 27 '13 at 13:21
Thank you, however I am looking for a solution which will work without adding additional flags to the `blame`, `diff`, `log`, and other Git commands. I have clarified the question and added a bounty. – dotancohen Jul 17 '13 at 12:17
Git has no ability to copy and follow history without additional flags. The mechanism simply doesn't exist. It's a "feature" in Git that allows you not explicitly communicate moves or copies to Git (it just figures it out after the fact). The downside is that since it's using heuristics to determine this, the feature is disabled by default because it can be time consuming operation depending on the commits. I think your best bet is to name your file `git.php` in the repository and generated your versioned `git.php` files from the tags. Then you don't need options to see all the history. – John Szakmeister Jul 17 '13 at 13:24
Thank you, this is a clever idea for having a running 'dev' git.php file that can be diff'ed blame'd, etc and then branched off of. Unfortunately the scheme cannot be retroactively added to a project but I'll consider it as a possible solution while review some of the other ideas presented here. Thank you! – dotancohen Jul 25 '13 at 06:04

DaveRandom · Answer 2 · 2013-06-27T13:47:11.660

I think you are clouding this a little, in that it seems that you are attempting to combine the way that version control handles things with the way that the API is exposed (i.e. how the web server handles things).

In order for multiple versions of the API to work simultaneously, the consumer presumably needs to specify with version they want to use for a given call. For the purposes of this answer I'll assume you are working in a similar way to the Stack Exchange API, so that the version is specified as the first "directory" component of the API URL (e.g. for version 1.5 I direct my request to http://domain.tld/1.5/call, version 1.6 I use http://domain/1.6/?method=call, etc etc). But really this element doesn't matter, as long as you have some mechanism for determining the appropriate version and routing the request to the correct controller at the web server level.

Version control

The approach I would take here is fairly simple. Every version gets its own branch in the repository. Any development work performed against that version is either done in a branch from the version's branch, or committed directly to the version. Master always contains the most recent stable release.

For example, let's say the current release is 1.5 and everything is currently under master and you have no historical branches. Draw a line under the current stable code, and create a branch called 1.5. Now, to start development on 1.6, which will build on the 1.5 branch, create a new branch from master and call it 1.6.

Any development that works towards 1.6 happens in the 1.6 branch, or other branches created using 1.6 as a base. This means everything can be nice and cleanly push/pulled into the 1.6 branch as appropriate.

If you need to apply a small bugfix in the 1.5 release, you can easily do this in the 1.5 branch. If you want to pull a commit from the 1.6 branch, you will need to "cherry-pick" it - since the branches have started to diverge, any such issues would need to be dealt with manually to ensure maximum safely for protecting the "stable" codebase.

When the time comes to create 1.7/2.0/whatever, pull the 1.6 release into master, tag it, and create a new branch for the new version.

In this manner, a complete history of who did what and when for each version/release is stored in the branches. As mentioned by others, don't forget to tag your milestone releases.

Web server

With the above approach, the web server setup is fairly trivial to maintain. The root of each release is simply synced with the appropriate branch.

So, for the sake of simplicity let's imagine that the root directory of the repository in version control corresponds to the document root of the API code (in reality this is unlikely to be the case, but a bit of URL rewriting or similar approaches can resolve this).

In the document root for the domain on the web server, we create the following directory structure:

<document-root>
    |
    |--- 1.5
    |
    |--- 1.6

Into each of the 1.5, 1.6 directories we clone the repository from central version control, and switch to the appropriate branch. Every time you wish to push a change live, simply pull down the changes from version control in the appropriate branch.

In a high volume environment you might have a whole server dedicated to serving each version with the version identifier as a subdomain, but the same general principle apply - except that the repository can be cloned straight into each server's document root.

A lot (if not all) of the process of creating the directories for new branches, cloning the repo into it and switching to the appropriate branch, as well as pulling down patches/bugfixes for releases can be automated with scripts/cron etc, but before you do this don't forget: pushing changes to a live server without human involvement often ends in tears.

An alternative approach

...would be to create a single parent repository that serves as the document root for the domain. In this you would create submodules in the root of the repository for each version. The overall effect this would create would be quite similar, but have the "advantage" of only having to sync a single repository on the server, and keeping the web server's directory structure defined by version control. However, personally, I don't like this approach, for a couple of reasons:

Submodules are a pain to maintain. They are attached to a particular commit, and it's easy to forget that.
I believe the control afforded by the branch-driven approach is more granular, and clearer as to exactly what is going on.

I accept that both of those reasons are largely personal preference though, which is why I bring it up as a possibility.

score 2 · Accepted Answer · answered Jul 23 '13 at 03:32

This is a goofy hack but it seems to come close to the behavior you want. Note, it assumes that you have tagged the earliest commit of 0.5.php you care about as first:

branch

% git checkout -b tmp
make a patch folder and patch versions of your 0.5.php file's commit history

% mkdir patches && git format-patch first 0.5.php -o patches
delete your file and checkout the first copy of it

% rm 0.5.php && git checkout first -- 0.5.php
rename your file

% mv 0.5.php 0.6.php
tweak your patch files to use the new name

% sed 's/0\.5\.php/0\.6\.php/g' -i patches/0*
commit (if you haven't already a couple of times)

% git add -A && git commit -m'ready for history transfer'
apply the patches

% git am -s patches/0*
go back to master, pull the new file over and delete the tmp branch

% git co master && git co tmp -- 0.6.php && git branch -D tmp

Voila! You're 0.6.php file now has a history that replicates your 0.5.php file's except each commit in 0.6.php's history will have a unique id from those in 0.5.php's history. Times and blame should still be correct. With a little extra effort you could probably put all of that in a script and then alias the script to git cp.

Thank you, if I can get this working then it is the best solution presented. However, I cannot get it to work! Assuming that I tag each commit, when I use the initial commit's tag, then the new file does not have the old file's history. When I use the current tag, then the patch file is not created. — dotancohen, Jul 25 '13 at 06:56
I cannot yet get any of the answer to work, but this looks the most promising. Thus, I am awarding the bounty here. Thanks! — dotancohen, Jul 25 '13 at 10:42
You shouldn't need to tag each commit. The only tag I used in the example above was to provide an easy handle for the earliest/oldest commit in the history I wanted to copy. Using the current commit won't create a patch because there's no change. It is your current (recorded) state. When you used the oldest commit in your file's history, and you got to step 6 did you have a patches subdirectory with a number of patch files in it? — Alnilam, Jul 27 '13 at 12:31

score 2 · Answer 4 · edited May 23 '17 at 11:45

WARNING: The following command rewrites history which is most often undesirable on shared repositories. Think before you push -f.

Rewrite your history to include a second copy of the file with git filter-branch:

git filter-branch --tree-filter 'if [ -f 0.5.php ]; then cp 0.5.php 0.6.php; fi' HEAD

Now 0.6.php is an EXACT duplicate of 0.5.php over all of history.

Your co-worker then needs to handle the new history.

How do I recover/resynchronise after someone pushes a rebase or a reset to a published branch?

score 1 · Answer 5 · answered Jun 27 '13 at 12:51

1

Check out git subtree (also here). With that you must be able to split off part of the history with that one file. You can duplicate it too, if bot else using interactive rebase. Then you can merge it back, and have the duplicate.

answered Jun 27 '13 at 12:51

Balog Pal

16,195
2
23
37

Thanks. I seem to be missing the 'magic glue' to get a copied file back into the main branch. I'm trying to use the main branch as a submodule of itself in order to copy the file. If this is not what you meant, then I'd like to know what you do mean. Thank you. – dotancohen Jul 25 '13 at 06:27

score 1 · Answer 6 · answered Jun 27 '13 at 14:21

A more standard way would be to only have one api.php file, and use branches and tags to mark new versions.

As for serving the files : if you want to offer several versions of your api to your users, use some deployment process to checkout and build specific versions of your api, rename and move as you wish, and set public access to this - not to your dev tree.

Copy file and its entire history

6 Answers6

Linked