55

We have a project with around 500,000 lines of code, managed with git, much of it several years old. We're about to make a series of modifications to bring the older code into conformance with the developer community's current standards and best practices, with regards to naming conventions, exception handling, indentation, and so forth.

You can think of it as something between pretty printing and low level/mechanical refactoring.

This process is likely to touch almost every line of code in the code base (~85%), and some lines will be subject to as many as five modifications. All of the changes are intended to be semantically neutral.

  • Is there any way to make the changes transparent to git blame, etc. so that when looking at the code a month from now we'll see the commit the logic was introduced in, not the one in which the indentation or capitalization was changed?
  • What's the best way to pull merges from forks that have not undergone this process? My present plan would be to have a script clone the forked repo, apply the automated process to it and its base, diff them, then apply the diff. But I'd love to have a cleaner answer.
  • Are there any other problems of this sort that I'm not seeing, and if so what can be done to mitigate them? I'm figuring that git bisect, etc. should be fine, git log, etc. crossing the great divide will be annoying unless you are careful, and git diff will be hopeless, but I'm not convinced I'm not overlooking another pain point.
  • MarkusQ
    • 21,814
    • 3
    • 56
    • 68

    4 Answers4

    27

    I don't know how best to deal with some of the more invasive changes you're describing, but...

    The -w option to git blame, git diff, and others causes git to ignore changes in whitespace, so you can more easily see the real differences.

    Phil
    • 4,767
    • 1
    • 25
    • 21
    • 7
      And `-M` / `-C` options to `git diff` and `git blame` make it follow renames and copies; in the case of `git blame` also moving and copying of fragments of code across files. – Jakub Narębski Dec 01 '09 at 15:40
    13

    I would recommend making those evolutions one step at a time, in a central Git repo (central as in "public reference for all other repositories to follow):

    • indentation
    • then reordering methods
    • then renaming
    • then ...

    But not "indentation-reordering-renaming-...-one giant commit".

    That way, you give to Git a reasonable chance to follow the changes across refactoring modifications.

    Plus, I would not accept any new merge (pulled from other repo) which do not have applied the same refactoring before pushing their code.
    If applying the format process brings any changes to the fetched code, you could reject it and ask for the remote repo to conform to the new standards first (at least by pulling from your repo before making any more push).

    VonC
    • 1,262,500
    • 529
    • 4,410
    • 5,250
    • We've been leaning that way too. As for not accepting patches unless their rebased against the post-transform code, that isn't really a viable option; "rebasing" across such a change would amount to manually most/all retyping the changes (think how the inevitable merge would go if most lines had been changed). So we considered making them run the refactor tool themselves, but since this process could be automated why not run it ourselves on patch acceptance rather than making them learn it, and adding an extra hurdle on contributions? – MarkusQ Dec 01 '09 at 16:08
    • @MarkusQ: I agree on the principle, but just to be sure: I wasn't talking about a mandatory "rebase" to be done on the client side, only about a mandatory "reformat" to be done by the client before his push (reformat checked in a hook on the server side by re-applying the same reformat and checking the result is the same than the file received). That will avoid many merge conflicts when those same clients fetch the central public repo changes and rebase their work on top of it. – VonC Dec 01 '09 at 16:57
    10

    You will also need a mergetool that allows agressive ignoring of whitespace. p4merge does this, and is freely downloadable.

    krosenvold
    • 75,535
    • 32
    • 152
    • 208
    • Probably because the question is about git and how it will handle the changes, and not necessarily other tools to for helping with the work. – hlovdal Jan 25 '10 at 11:32
    0

    This question has a good solution for it. Briefly use git filter-branch.

    I used for myself this code:

    git filter-branch --tree-filter "git diff-tree --name-only --diff-filter=AM -r --no-commit-id \$GIT_COMMIT | grep '.*cpp\|.*h' | xargs ./emacs-script" HEAD

    Which ./emacs-script is a script I wrote using emacs to change the code-style, it simply just call indent-region on each file.

    This code works fine if there is not any file that deleted or removed from repository, On that situation using --ignore-unmatch may be helpful but I'm not sure.

    Community
    • 1
    • 1
    motam
    • 677
    • 1
    • 6
    • 24
    • `git filter-branch` as used in that answer will rewrite the whole history to pretend the code was never written in violation of the current standards and best practices. That means you would need to "back-port" the improvements to all individual commits of the past, instead of making a single commit to reformat only the current state once. – sschuberth Oct 02 '15 at 07:36