5

I was doing a comparison on speed for GIT and Mercurial.
I choose a big project of 9072 files (mainly php files and several images) with a size of 95.1 MB.

This is a fake project, and maybe give someone the idea on how to explaing the results I got - it is a wordpress download, unchanged, and copied 12 times inside two folders - one for the GIT and other for the Mercurial repository.

I then create a GIT repository and commit (using TortoiseGIT) and after finished, I did the same on the other folder for Mercurial using TortoiseHG.

Git Results
Time: 32 minutes and 30 seconds to commit everything
Repository size: 6.38MB, with only 847 files.

Mercurial Results:
Time: 1 minute and 25 seconds - yes, its only 1 minute.
Repository size: 58.8MB with 9087 files.

I'm not arguing the best or whatever, I'm just trying to understand the differences and how both SCM created the repositories.

It looks like HG did a copy of the files, with some sort of compression.
But I do not understood what Git did.
Can someone explain the results?

PS.: I know there are some questions already about GIT and Mercurial, I'm only trying to figure out the result of this test - and even if its a valid test. When I started I was only checking speed, but I endup with some question marks on top of my head...

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
ronaldosantana
  • 5,112
  • 4
  • 22
  • 28
  • 2
    If you are evaluating performance, should you not be using the tools 'raw', instead of dealing with GUI/3rd-party wrappers and all the (unknown? non-uniform?) overhead they impose? – Jeet Oct 20 '10 at 23:06
  • 4
    1) Why evaluate SCMs based on performance? 2) Why do so of the performance of creating a single, 9000 file commit (if I interpreted that correctly)? That is not normal usage. 3) Why try to do this yourself, when there are countless comparisons you can find on Google? – wuputah Oct 20 '10 at 23:14
  • I suspect you are doing something wrong (try checking out the tip from each of the repos and comparing them to make sure everything is checked in). Also, as wuputah says, the conclusions you can draw from this comparison have no relevance. – user318904 Oct 20 '10 at 23:19
  • @Ronaldo Junior: Provide the purpose? What is speed without the purpose. The intricacies between GIT and Mercurial can be read on the web and they are not necessarily designed only for speed. – pyfunc Oct 20 '10 at 23:23
  • @wuputah +1 Couldn't find a better way to say it (I know I tried, I answered before reading your comment... :D ) – Eric-Karl Oct 20 '10 at 23:53
  • 1
    This is Windows since he used TortoiseGit and thus I would _not_ consider it a valid benchmark (Git doesn't run at its best on Windows due to lack of true POSIX support) – alternative Oct 20 '10 at 23:55
  • 2
    Its windows - I'm testing this because this are the tools we going to use. – ronaldosantana Oct 21 '10 at 00:00
  • Its not one of the main stuff to take in account to choose between then, it is only another test - I 'm just trying to understand what happened here. No conclusion between then will be made based on this test, I only want to understand how the GIT and Mercurial do their job. – ronaldosantana Oct 21 '10 at 00:03
  • 1
    Funny how we have a lot of critics here... you made a real question trying to understand something, people vote down, while there are lots of "what is the best" questions out there with lots and lots of votes up. Is it wrong to try to understand a test result. This is not the decision make point, only trying to realize how GIT and Mercurial works. – ronaldosantana Oct 21 '10 at 00:08
  • Part of your question was whether it's a valid test. I can't know that without knowing the purpose. And I as I thought about that, more questions came to mind. In short, the problem is not the premise of your question, but how it was asked. – wuputah Oct 21 '10 at 01:07
  • @Ronaldo: What you are testing is not Git and Mercurial, but TortoiseGit and TortoiseHG. – Jakub Narębski Oct 21 '10 at 20:21
  • @wuputah: -1 to your hate comments. +1 to jleedev 's answer (13 upvotes as of now), who, at least, is contributing something useful to this question, unlike your hate-comments. Read *jleedev*'s answer and come back and tell us if you don't think *Ronaldo Junior* didn't just create something worthy of addition to *Stack Overflow*. – SyntaxT3rr0r Oct 25 '10 at 00:32
  • @Webinator: Wow dude, the only person being hateful here is you. I was responding to Ronaldo's comments. I believe the reason why the question was not received positively was how it was asked. Being critical != being hateful. – wuputah Oct 25 '10 at 01:04

3 Answers3

18

Get your tools checked; both hg and git (command line) import these trees in about a second. Consider the command-line versions of the tools in preference to the GUI wrappers.

You’re running into a case at which git excels and hg is less efficient. Mercurial uses a separate file as the revlog of each file, while git likes to keep things more unified. In particular, copying the same directory twelve times takes virtually no extra space in git. But how often does that happen? I hope not very. If you routinely import thousands of files and images, and not just as the initial commit, a DVCS may not be the right tool for you. Something like rsync or a centralized VCS would be better — a DVCS is generally tuned for a single project that holds text files and receives patches and merges over time. Other kinds of tools make different tradeoffs.

There’s really not much point importing large directory trees and carefully examining the files that appear; you can read the documentation if you like. The main lesson here is that git keeps a snapshot of the entire directory structure, which allows it to efficiently pack things (the bundle for wordpress is 2.7MB, which is no larger than the tarball), but it can be more expensive to compute diffs. Mercurial maintains a lot more per-file information like logs and diffs, which means that accessing the log of just one file is much faster than in git, but lots of identical files and directories can have a higher space cost.

I can create a pathological case, too. Here’s one where git wins:

for dir in {1..100}; do
  mkdir $dir
  for file in {1..100}; do
    touch $dir/$file
  done
done
hg add {1..100}; hg commit -m tweedledee
git add {1..100}; git commit -m tweedledum

Yep, that’s 10,000 empty files across 100 identical directories. Git imports the entire thing in a tenth of a second, and the commit itself is less than a kilobyte. Mercurial, which creates a logfile for each file, takes about four seconds to commit the entire thing, and ends up with 10140 new files in .hg, totalling 40MB.

Here’s one where mercurial wins:

mkdir -p a/b/c/d/e
for i in {1..1000}; do
  echo hello >> a/b/c/d/e/file
  hg add a; hg commit -m "Commit $i"
  git add a; git commit -m "Commit $i"
done

That’s one thousand commits, each introducing a tiny change in a deeply nested file. Each commit in git introduces eight new objects, which are individually deflated but stored as separate files. Eventually, git decides to repack, which takes time. Unpacked, the whole thing is about 32MB, and packed it’s 620K. Mercurial, on the other hand, simply appends a few notes to a few logfiles each time, and the .hg is 396K at the end.

What’s the point of all this? The point is that none of the cases discussed in this thread are realistic. In everyday usage, with realistic repositories, both tools are great. Just learn one.

The manuals themselves don’t exactly show you from beginning to end how a commit is constructed, but Git Internals in Pro Git, Internals in the Mercurial wiki, and Mercurial Internals from PyCon 2010 should get you started.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Josh Lee
  • 171,072
  • 38
  • 269
  • 275
  • 3
    And FYI, there's a feature being worked on in Hg, called lightweight copy, which eliminate most of the overhead we have when files are copied around. – tonfa Oct 21 '10 at 09:27
  • Thanks for the explanation and for the links. That will do :) – ronaldosantana Oct 21 '10 at 19:40
  • I agree with Christophe, great message! So often I find that knowing about the internals provides the framework necessary for my mind to understand and memorize the way a tool works, and to make sense of it. – Lumi Mar 24 '11 at 16:23
2

I suggest you compare DVCS on features and workflow rather than speed and disk space. Disk space is pretty cheap and both Git and Mercurial are pretty efficient for storage. As for speed, neither one will let you down even for very big projects. Go for features and one that agree with the workflow you use (or want to use).

As for the difference in storage space in your example, git doesn't track individual files so it will notice the content being repeated and be more efficient (while taking more time)... yet, how often does that happen in real life?

I suggest you read mpe's linked posts/articles too. :D

Eric-Karl
  • 1,471
  • 9
  • 13
  • Hi Eric, as I said on the comment above, its not from this test that we going to make a decision. I put on the question - I just want to understand the stuff behind the scene. – ronaldosantana Oct 21 '10 at 00:04
1

That doesn't sound like a very good test, ie. it's not often that you commit to a project with no history and 12 identical copies of the same content.

What is the Difference Between Mercurial and Git?

Git and Mercurial - Compare and Contrast

http://www.wikivs.com/wiki/Git_vs_Mercurial

Community
  • 1
  • 1
mpe
  • 2,640
  • 1
  • 19
  • 17
  • Thanks - I read that already and really like both SCM. I only want someone to try and explain the different results, thats it. – ronaldosantana Oct 21 '10 at 00:06