5

In the past, I have been working with the largefiles extension in mercurial to save data together with the code I have been working on. I think this was a mistake and I would like to remove the "largefiles" directory (8GB). Our network user directories are limited to 10 GB, and I need space. I have not used any large files for a long time now. I will not miss them when they are gone forever.

So my questions are

  1. Can I remove the largefiles directory under .hg without damaging the repo?
  2. If I do, will I be able to check out old code, even if some large datafiles are missing?
  3. Should I remove those files from all clones of that repo to avoid polluting all repos again with largefiles from another clone?
marco.m
  • 4,573
  • 2
  • 26
  • 41
Stephan
  • 746
  • 5
  • 14

2 Answers2

5

For your first question I did an experiment:

  1. Created a repo with a large file.
  2. hg update null
  3. Deleted .hg\largefiles
  4. hg update

The large files came back! It turns out, at least on Windows, the large files are also cached in %UserProfile%\AppData\Local\largefiles. Since this was my only largefile database, It only contained my one large file, so I deleted that, too. This cache contains large files from multiple local largefile-enabled databases, so you'd have to be careful with this one. If it seems wasteful to have two copies, it turns out if the local databases are on the same drive as the %UserProfile%, then they are hardlinked. I have two drives in my system, and it turns out if a database is on a different drive it is still copied to the AppData location, but is not hardlinked and doubles your disk usage.

Once all copies of the large file were deleted, an hg update gave:

1 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
largefile.dat: can't get file locally
(no default or default-push path set in hgrc)
0 largefiles updated, 0 removed

I then removed [extensions], largefiles= from .hg\hgrc to disable the extension. At this point the repository worked fine, but still had the .hglf directory with hashes in changesets that used to have large files. so the answer to your second question is yes, you can check out old code.

For your third question, to eliminate all traces of largefiles and hashes, create a file with:

exclude .hglf

and run:

hg convert --filemap <file> <srcrepo> <destrepo>

Your users will then have to clone this new, modified repository because convert modifies the changesets and the new database will be unrelated to the old one.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thanks for this clear and detailed response! Indeed, the largefiles directory comes back immediately, but empty (once I removed the User-Cache). Then, when I disable the extension in .hg/hgrc, the next time I type `hg st`, this happens: `abort: unknown repository format: requires features 'largefiles' (upgrade Mercurial)!`. Once I enable the largefiles extension, it works again. So I can't get across your second point right now. – Stephan Jan 22 '13 at 11:50
  • Did you make sure `.hg\largefiles` was still deleted after disabling the extension? – Mark Tolonen Jan 22 '13 at 14:53
  • 1
    Hi, yes I did. But I know now where the problem was. There is another file called `.hg/requires`, which contains a line `largefiles`. I simply deleted that line and now everything works. Maybe you used a different version of mercurial and didn't have this file. Anyway, thanks a lot for your help here! – Stephan Jan 22 '13 at 18:30
  • Thanks Mark. I stripped the last few changesets as my mistake was quite recent - also ---it pays to read the comments--- - Stephan's `.hg/requires` method worked for me too. As a bonus, stripping the changesets from KilnHg in the browser also worked seamlessly. TortoiseHg 3.2 (and therefore Hg 3.2) has an option in the local repo's Strip command to "do not modify working copy" - tick it. Oh, you did all this on a copy of the repo, right? – CAD bloke Nov 18 '14 at 18:18
2

The same command to convert a plain repository to largefiles, lfconvert, can also be used in the other direction:

$ hg --config extensions.largefiles= help lfconvert
hg lfconvert SOURCE DEST [FILE ...]

convert a normal repository to a largefiles repository

Convert repository SOURCE to a new repository DEST, identical to SOURCE
except that certain files will be converted as largefiles [...]

Use --to-normal to convert largefiles back to normal files; after this,
the DEST repository can be used without largefiles at all.

So the following command will do the trick:

$ hg --config extensions.largefiles= lfconvert --to-normal <LARGEFILE_REPO> <PLAIN_REPO>

You will need to coordinate with your team, so that:

  1. everybody pushes their latest changes to the largefile master repo
  2. access to master repo is disabled forever (to avoid accidental pushes)
  3. everybody removes the largefiles extension from their $HOME/.hgrc
  4. remove the largefiles extension from the hgrc of the user offering access to the master repos (the location of the hgrc depends on how the master repos are server, SSH or HTTP). This will make it impossible for somebody to accidentally add a largefile to a clone of the new repo and push it!
  5. perform conversion of master repo to plain repo
  6. decide on name/path change (if any) for the new master repo
  7. enable access to new, plain master repo
  8. everybody clones the new plain repo

Note that lfconvert is only available if the largefiles extension is enabled. What I suggest is, following point 3, to remove it from $HOME/.hgrc and enable it on a a single command with the --config extensions.largefiles= option, as shown in the example above.

Note also that converting to a plain repo will enable the usage of the recent fsmonitor extension, that uses the kernel inotify mechanism (or equivalent on MacOSX) to dramatically speed up certain operations like hg status. For example for a huge repository I have, hg status went from 10 secs to 0.5 secs :-)

marco.m
  • 4,573
  • 2
  • 26
  • 41