I was asked this question on #git
earlier but as its reasonably substantial I'll post it up here. I want to run a filter-branch
on a repo to modify (thousands of) files over hundreds of commits using a python script. I'm calling the clean.py
script using the following command in the repo directory:
git filter-branch -f --tree-filter '(cd ../cleaner/ && python clean.py --path=files/*/*/**)'
Clean.py looks like this and will modify all files in path (i.e. files/*/*/**
):
from os import environ as environment
import argparse, yaml
import logging
from cleaner import Cleaner
parser = argparse.ArgumentParser()
parser.add_argument("--path", help="path to run cleaner on", type=str)
args = parser.parse_args()
# logging.basicConfig(level=logging.DEBUG)
with open("config.yml") as sets:
config = yaml.load(sets)
path = args.path
if not path:
path = config["cleaner"]["general_pattern"]
cleaner = Cleaner(config["cleaner"])
print "Cleaning path: " + str(path)
cleaner.clean(path, True)
After running the command the following is outputted to terminal:
$ python deploy.py --verbose
INFO:root:Checked out master branch
INFO:root:Running command:
'git filter-branch -f --tree-filter '(cd C:/Users/Graeme/Documents/programming/clean-cdn/clean-jsdelivr/ && python clean.py --path=files/*/*/**)' -d "../tmp"' in ../jsdelivr
Rewrite 298ec3a2ca5877a25ebd40aeb815d7b5a5f33a7e (1/1535)
Cleaning path: files/*/*/**
C:\Program Files (x86)\git/libexec/git-core\git-filter-branch: line 343: ../commit: No such file or directory
C:\Program Files (x86)\git/libexec/git-core\git-filter-branch: line 346: ../map/298ec3a2ca5877a25ebd40aeb815d7b5a5f33a7e
: No such file or directory
could not write rewritten commit
rm: cannot remove `/c/Users/Graeme/Documents/programming/clean-cdn/tmp/revs': Permission denied
rm: cannot remove directory `/c/Users/Graeme/Documents/programming/clean-cdn/tmp': Directory not empty
The python script executes successfully and modifies the files correctly but the filter-branch
doesn't finish fixing up the commit. There appears to be a permission issue however I haven't been able to get around it running with elevated privileges. I've tried running the filter-branch on win7, win8, and ubuntu with git v1.8 and v1.9.
Edit The script works as is on Centros with git1.7.1
The goal is to reduce the size of a CDNs repo (nearing 1GB) after the contents in files/*/*/**
finishes syncing with a database.
The source code of the project
Target repo for the rewrite