3

I'm very new to using git, and previously haven't really tried to "organize" any projects I've worked on. I just recently purchased a development server for personal use, however, and I wanted to start organizing all my projects and using version control.

I've spent the past 8 hours researching different recommended methods for organizing files in a project, and I realize that it's a very subjective matter. However I've developed a system that I think will work for just about any cause for me and I have one very objective question in regards to how to accomplish a certain task with the directory structure.

Presently I'm looking into a structure akin to the following:

src/ - All deliverables in an uncompiled form (PHP files, c source files, etc)
data/ - Crucial but unrelated data (SQL databases, etc.)
lib/ - Dependencies -- THIS IS WHERE MY QUESTION LIES
docs/ - Documentation
build/ - Scripts to aide in the build process
test/ - Unit tests
res/ - Not version controlled. Contains PSD files and non-diff-able stuff
.gitignore
README
output.zip - Ready-to-install finished product (just unzip and go)

As I mentioned - my real issue revolves around this lib/ directory. This needs to contain all files and programs which my project requires to run, but which are outside of the scope of my project and I won't be editing. Some features that I need this folder to have:

  • Since these are needed for my final product to run, they must be included in output.zip
  • I would like this folder to be version controlled so that anyone who downloads my git repository will have access to all dependencies
  • If several projects have the same dependency, I do NOT want to have 18 redundant copies of the same file on my server
  • I would like to be able to pull these dependencies from other projects of mine (one project should be able to serve as a library for a separate project)

I can avoid having 18 redundant copies of the same file by using a virtual directory (symlink), however from my understanding git would copy this symlink as-is into the repository without copying the files. Therefore if anyone else fetched my repository they would have a dangling pointer and no libraries.

At first it looked like I could do what I wanted using git-submodule. However from my understanding this takes the entire contents of another repository and treats it as a sub-directory. Therefore if I included "dependency A" my libraries folder would look something like:

/lib/A/src/
/lib/A/data/
...
/lib/A/test/
.gitignore
README
output.zip

In the case of a script (PHP, Perl, etc.) I could probably load the dependency using require('lib/A/src/dependency.php'), but in the case of a DLL or binary file I would have no easy way to read the output file from output.zip. I could have the finished project stored directly at the root level instead of wrapped up in a pretty zip file, but if the project were, say, a website - this could mean hundreds of files cluttering up my repository root.

How can I include another repository as a library of my own, easily reference the library files within my own project, have the library meaningfully copied to anyone who fetches my repository, and prevent redundant copies of the same files on my development server?

EDIT: After searching on Google for a while I found this similar issue, however it only addresses PHP projects. While an autoloader may allow you to mask the underlying file system in a PHP environment, how would you apply a similar approach to a C++ project? Or a Python project? Or a Java project?

As I thought more about this project today a few other thoughts came to mind which may require a new direction of thought. First is the problem of very deep library nests. If project A depends on project B which depends on project C which depends on project D then you would have a directory structure like so:

A/lib/
A/lib/B/
A/lib/B/lib/
A/lib/B/lib/C/
A/lib/B/lib/C/lib/
A/lib/B/lib/C/lib/D/

Obviously this would not only get annoying, but redundant in its own way.

How do normal people deal with dependencies when doing a git repository?

Community
  • 1
  • 1
stevendesu
  • 15,753
  • 22
  • 105
  • 182

4 Answers4

3

In the projects that I have been on, submodules are good only for certain cases when it comes to dependency management, in other cases this is complemented by other framework. Mostly, I prefer to use submodules when I need the complete repository, ex- in case I have a common build script that I can share across projects.

There are specific tools focusing on dependency management in various stack -

etc.

These tools take care of the redundancy management.

Currently, I am on a .net project, where we have this setup -

  1. Powershell build scripts shared across projects using submodules. Buildscript repository contains all 3rd party executables required to deploy any of our .net applications and the respective wrapper powershell scripts, plus some scripts to load the conventions, config etc.
  2. Nuget server (via Teamcity) hosting nuget packages for common binaries shared across projects. Nuget Package restore is a feature that allows fetching packages as part of the build.
Srikanth Venugopalan
  • 9,011
  • 3
  • 36
  • 76
  • I can understand the advantage to using a language-specific tool for managing redundancy (like using `npm` for node.js), however what do you do when one of your projects is dependent on another? For instance, I have an authentication script in PHP which I've used in **several** projects. This authentication script may be one project in and of itself, but it's also a dependency for many other projects. – stevendesu Mar 09 '13 at 16:22
  • I am not familiar with php so much to comment, but since it is an interpreted language like ruby, I can draw parallels. This question talks about Rubygems like feature in php http://stackoverflow.com/a/12244957/326543. Could this be a solution for you? – Srikanth Venugopalan Mar 09 '13 at 16:27
  • I've never heard of Composer, and reading through the introduction it sounds like (with Satis installed) it could do exactly what I need. Sounds like I've got a big learning curve ahead of me =) I'll leave this open a bit longer to see if anyone has a brilliant solution using only directory structures and git, but if nothing turns up then you'll definitely get the green check. – stevendesu Mar 11 '13 at 03:13
  • This makes a lot of sense and add to @Goran's answer with specific examples. – manojlds Mar 13 '13 at 05:01
2

While it is nice to unify workflow you have to respect the beast you're trying to tame. You should have different directory structures for different projects. Working from 3D animation projects to PHP project to C++ projects and everywhere in between I find that squeezing them to conform to the same workflow just adds work and headache in the longrun. Most IDE's have a good "new project" structure straight out of the box, and it is one that other developers will know and understand straight away.

As for the dependency problem try implementing the superproject approach: http://git-scm.com/book/en/Git-Tools-Submodules

Goran
  • 677
  • 3
  • 22
  • Although it doesn't actually answer my question, a very good answer. Definitely something I may consider after the last 3 days of painfully fighting with git – stevendesu Mar 11 '13 at 03:04
  • I remembered something about superprojects in git that may help you. I have updated the answer. – Goran Mar 12 '13 at 21:04
0

You've asked a general question but also asked specifically about a few instances. I'm going to lean towards being more general. The short answer: this is a build system concern, not a version control system concern.

In the case of Java, there are a few different dependency management/resolution tools that you can use. The build system should understand how to fetch those dependencies at build time and make them available. They are, however, transient - you don't check them in to version control. Furthermore, Maven - for example - uses a /target folder that both contains your output (e.g. output.zip - which I'd also recommend because it makes cleaning output easier. What if you have more than one output file? What about variants? etc.) as well as other items such as static analysis output - and it also uses an external directory to locally cache dependencies, but this could be ephemeral and it wouldn't care. Bottom line: it's not persisted in to a version control.

This is not nearly as easy in C++ as far as I know. CMake seems to support building external projects. I've only recently started to play around with this to see what is possible, so I don't want to mislead you by saying "it can easily be done", but it stands to reason that it can be done, the question is only how much work you have to put in to it. So whether or not you call the folder /libs, you should make the build treat dependencies as transitive (and then good luck with transitive dependencies).

Doug Moscrop
  • 4,479
  • 3
  • 26
  • 46
  • To be sure I understand, your recommendation would be not to version control dependencies, but instead to have them in an external file and included via some intelligent build script? If so, how would someone forking my project on github (or similar git host) be able to access these dependencies? – stevendesu Mar 09 '13 at 02:19
  • That's where a general answer probably won't suffice - so, using Java as an example, you can publish artifacts to centralized repositories for anyone to consume. They usually cache them on their local machine, but that's really an optimization. I am not familiar enough with C++ to answer in that regard, but I've been told by colleagues that in Linux it's not so much of a big deal because of the distributions tendency to have a package management system. Windows is another story. – Doug Moscrop Mar 11 '13 at 15:46
0

Do not embed libraries, this is a security nightmare! When you embed for instance some image format library like libpng, libjpeg or libtiff in your application because you want to use it's image format, you open up your application to any security vulnerabilities those libraries might contain and the user has no easy way of knowing that they need to update your program to resolve the security issue. When you leave the dependency outside the scope of your application then the package manager knows about the library and can take action when security vulnerabilities are exposed.

Leave libraries you depend on outside the scope of your project. If you have personally developed libraries that you use in several projects, put it in it's own repository and make separate releases of it.

For unix like OSes (linux/bsd/solaris/etc.) have users install them separately through their package manager, if you release your software the package manager will know about your dependencies and install the necessary dependencies before it installs your application so no manual actions are necessary.

For Windows use a separate bundling process to bundle the libraries you depend upon into a convenience installer which install the libraries to shared system directories, not your program directory.

There is by the way no technical means in git to do what you want without massive duplication.

wich
  • 16,709
  • 6
  • 47
  • 72