8

Similar to this link but for mercurial. I'd like to find the files that are most contributing to the size of my mercurial repository.

I intend to use hg convert to create a new, smaller repository. I'm just not sure yet which files are contributing to the repository size. They could be files that have already been deleted.

What is a good way to find these anywhere in the repository history? There are over 20,000 commits. I'm thinking a powershell script, but I'm not sure what the best way to go about this is.

Community
  • 1
  • 1
Michael
  • 3,222
  • 2
  • 21
  • 22

1 Answers1

10

Check hg help fileset. Something like

hg files "set:size('>1M')"

should do the trick for you. You might need to operate over all revisions, though as it only operates on one revision. In bash I'd try something like

for i in `hg log -r"all()" "set:size('>400k')" --template="{rev}\n"`; do hg files -r$i "set:size('>400k')"; done | sort | uniq

might do the trick. Maybe it can be optimized as it's currently a bit duplication and might run for quite a bit; on the OpenTTD repository with 22000 commits it took on my laptop just short of 10 minutes.

(Also check hg help on templates, files and grep)

planetmaker
  • 5,884
  • 3
  • 28
  • 37
  • 2
    Thanks, that works wonders. I am using windows. For completeness the powershell script is `hg log -r"all()" "set:size('>1024k')" --template="{rev}\n" | Foreach { hg files -r $_ "set:size('>1024k')" >> results.txt; get-content results.txt | sort | get-unique > results2.txt; Remove-Item results.txt; Move-Item results2.txt results.txt }` and the bat file would be `for /F %i in ('hg log -r"all()" "set:size('>1024k')" --template="{rev}\n"') DO hg files -r %i "set:size('>1024k')" >> results.txt` (that doesn't sort/filter though) – Michael Dec 15 '15 at 15:13