3

I'm using git for a medium sized project: ~800 commits and maybe 100 or so files. My .git folder is 18.8mb (that I can understand, seems to be in the order of the included file sizes) and 5586 files. That seems like too much - I would even say ridiculously so.

Many files is hard on the file system, and even harder if you have to sync that folder. Is that how it's supposed to be? Any way on lowering that? My naive way of handling something like this would be just putting all the needed files inside of an archive of sime kind.

Basti
  • 2,228
  • 1
  • 19
  • 25

1 Answers1

5

tl;dr: This is normal. Don't worry about it.

Run git gc if you like, but that will be run automatically.

Many files is hard on the file system

No, on certain types of file systems many files in a single directory can make finding files in that directory slow. Particularly file systems which store the contents of directories as a linked-list. They would have to walk the whole list of files. This was a problem on FAT32 and ext2.

Modern file systems like NTFS (Windows), ext3 and ext4 (many Linuxes), and HFS+ (OS X) can efficiently handle large numbers of files in a directory by using a variation on B-Tree.

Furthermore, Git was developed by kernel developers and they know what they're doing. Git does not put its objects in a single directory, but breaks them up into subdirectories using the first two characters of the object ID. Since the commit IDs are hashes they will be evenly distributed over many directories.

Finally, recent versions of Git will periodically reduce the number of individual object files by compressing them into packfiles.

even harder if you have to sync that folder

This implies you've put Git onto a shared drive like Dropbox. Putting Git on Dropbox is like disassembling a truck and mailing it to yourself in the post. It's slow, expensive, you're likely to lose pieces, and you could have just driven the truck. Dropbox can kill Git performance and corrupt the repository. Anything with slow seek times like a network drive is very bad for Git which uses the filesystem as a simple object database.

Git is a distributed version control system. If you want to distribute your repository, use Git to do it. It's very efficient at it. You can keep your repo on Dropbox, but use git-remote-dropbox to do it safely. You can use an existing Git hosting service like Github or Gitlab. Or you can put a bare repository somewhere you have ssh access to.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    git gc worked - it told me to prune and after doing both it's now a ~10fold reduction in number of foles. Thanks. Many files are still a major problems in general though. The fact that I can watch folder properties count the number of files is evidence that they are an issue. Also Dropbox is way easier than setting up some kind of server - I know that that infuriates many people but that's how it is. I'll stick with that. And DB is absolutely wrecked by lots of files. – Basti Jun 26 '20 at 15:22
  • @Basti "*The fact that I can watch folder properties count the number of files is evidence that they are an issue.*" No it isn't. Git doesn't have to count the files, it accesses them directly which is very efficient. You're concerned about corruption and speed, and using Git on Dropbox can corrupt a Git repository and slow it down. [Read the answers about Git & Dropbox](https://stackoverflow.com/questions/1960799/using-git-and-dropbox-together-effectively). Consider using [`git-remote-dropbox`](https://github.com/anishathalye/git-remote-dropbox) and read its FAQ. – Schwern Jun 26 '20 at 17:22
  • It's an issue for Windows is what I meant. I sync my repo over DB, then pause. It's the best solution. I don't want to use some third party stuff since Dropbox is kinda ubiquitous (at least in my environment). It's also an added backup in case git decides to explode. Mortals like me are no git masters - I would never use any commands besides commit (via gui) and look at the log. Git is a very fragile tool for people like me. – Basti Jun 26 '20 at 17:46
  • @Basti Yes, if you don't put in effort to learn your tools, your tools won't work well. If you tell yourself you can't, you won't. Git *is* hard, though it's gotten easier, but it's powerful and ubiquitous and and worth it. You can [learn Git](https://git-scm.com/book/en/v2/), it's surprisingly simple once you get it. You'll have more enjoyment using your tools well. And people won't get so infuriated with you. – Schwern Jun 26 '20 at 18:14
  • If it's worth it depends on your use case. For me it's not. The insane syntax makes it very time-consuming to maintain the knowledge if learned. Not worth for such a simple tool if all you need is look at what you did before. – Basti Jun 26 '20 at 18:32
  • Just a quick "fun" note. While NTFS is able to handle a folder with thousands of files, be aware that trying to open the folder in Windows Explorer, and doing anything other than just scrolling, like, say ... sorting? will be absolutely ludicrous. I have a folder with 68.000+ files (don't ask, not my design), and NTFS is super-fine about the whole thing, but Windows Explorer is taking like 30+ minutes to sort the contents. – Lasse V. Karlsen Jun 26 '20 at 19:09
  • On a more serious note, Microsoft has had a few posts about things they've done with a whole "git file system" which handles things in a more lazy manner. I would think that some of those things will become available to the public in the future. They manage things like the Windows operating system and so on with millions of files in the repository, and have done some thinking-out-of-the-box kinda design changes to how they deal with that. – Lasse V. Karlsen Jun 26 '20 at 19:11