0

so I am new to this whole GitHub and Git thing. I recently learned the basics of Git (adding, pushing, pulling, cloning, etc). My intro to Java professor asked me to make a git hub repository for all my class homework. She told me to organize it in such a way that there are separate folders for each homework and each homework folder contains multiple source files.

So I set up my files like this: Java(main folder) -> Hw1 + Hw2 + Hw3 etc. How would I do this using git? All of these folders should be on my local and git hub repositories and I should be able to make changes to them separately.

Thank You in advance. I am stuck.

  • You do all operations in the `Java(main folder)`, it should be able to pickup changes in the subfolders – Anurag Srivastava Jun 13 '22 at 15:56
  • whenever I try to do git add . in the java folder, it says "fatal, not a git repo". Then I thought let's make it a git repo by doing git innit, then after doing that I tried doing git add . again which gave me this error: "warning: adding embedded git repository: HelloWorld" – muhammad muzaib Jun 13 '22 at 16:00
  • 2
    You probably have git repos in your subfolders as well. They can be removed if you don't need to maintain their remote repos individually – Anurag Srivastava Jun 13 '22 at 16:04

1 Answers1

1

Let's start with some basics. You already understand that your computer uses a tree-structured file system: that is, a directory (or folder—the terms are now interchangeable) holds files and/or more directories/folders, which in turn hold more files and/or folders, etc. Windows natively uses a backwards slash \ to separate the various components, so that you might have:

java\hw1\main.java
java\hw1\sub.java
java\hw2\main.java

and so on. Windows can use forward slashes (some commands may use them for other purposes, but they do work in file names), and all non-Windows OSes tend to use forward slashes, which are easier to type. Git also uses forward slashes so that's what I'll do here.

(Aside: Windows and macOS by default use "case insensitive but case preserving" rules, so that if you create a file named readme.txt, you can later open it using the name ReadMe.txt or README.TXT but it remains named in all-lowercase. Git, by contrast, is usually case-sensitive and thinks that readme.txt, ReadMe.txt, and README.TXT are three different file names. This causes endless grief on such systems1 and sometimes the best, or at least easiest, way to avoid all problems here is to completely avoid uppercase letters everywhere. To the extent that you can use java instead of Java, hw1 instead of Hw1, and so on, I would encourage you to do so.)

When you ask Git to create a new, empty repository using git init,2 Git creates a hidden folder named .git. This hidden folder will contain all of Git's files: here, Git will store its main two databases. We'll talk about those in just a moment. The place where Git creates .git is whatever your current working directory is, so if you are in java/hw1 and run git init, Git creates java/hw1/.git. If you are in java and run git init, Git creates java/.git.

Note that java/.git and java/hw1/.git are different folder path names, and therefore you can create two repositories. You do not want to do this, but that's what you did. (I base this claim on this comment.) We'll come back to "how to fix this" soon.


1In particular, someone using Linux can literally create three different files that differ only in case, stuff all three into a commit in a Git repository, and leave you with a problem when you go to check out this commit on a Windows system. If you're used to the system mapping from typed-in-lowercase to the matching case, and you ask an editor to create java/hw1/thing.java on Linux, it might actually create a java and hw1 right next to your existing Java and Hw1. Since those are different directories they can store different files with the exact same names as those in Java/Hw1/, including name-case. Git will happily store all these files, and Windows often cannot extract such a commit properly.

2Note that git init will first check to see if you're already in some existing repository. In this case, rather than creating a new repository, Git will "reinitialize" the existing repository. In most cases "reinitializing" like this has no effect at all.


The main thing to know about Git and a Git repository

A Git repository—or what I sometimes call the repository proper—consists mainly of two databases. One is usually much bigger. It contains commits and other supporting Git objects. These objects all have hash IDs (or more formally, object IDs or OIDs) that Git must have in order to retrieve the objects from the database. This could force humans to memorize Git commit hash IDs, but that's a bad plan: hash IDs are very large, very random-looking, and impossible for humans to remember in general.

For this reason, a Git repository contains that second, usually much smaller, database. In this database, Git stores names: branch names, tag names, remote-tracking names, and many other kinds of names. These names are for you (and other humans) to use. Each name stores one hash ID, but that's enough to make everything work. So you'll use a branch name, like main or master. This name holds the hash ID of the latest commit, which allows Git to retrieve that commit.

Each commit stores two things:

  • A commit stores a full snapshot of every file (that Git knew about, that is) at the time you, or whoever, made that commit. The files inside the commit are stored in a special, read-only, Git-only, compressed and de-duplicated form, that only Git can read, and literally nothing can write. (This uses some of those "supporting objects" I mentioned; the files are actually stored in the objects database as "Git objects".) Because nothing but Git can use these files, the files in a commit are useless on their own. We'll see in just a moment how we work with these files.

  • Meanwhile, that same commit that's storing a snapshot, also stores some metadata, or information about the commit itself: who made it (you, probably), and when, for instance. To make "branches"—a poorly-defined word in Git (see What exactly do we mean by "branch"?)—work, the commit's metadata contains the hash ID of the previous commit.

This "contains previous commit's hash ID" is how Git stores history: the branch name, e.g., main, lets Git find the last commit you made, and then by reading that commit, Git can find the hash ID of the second-to-last commit. For instance, suppose the hash ID of the last commit is H (it's actually some big ugly hexadecimal number so we're just using H to stand in for it). Then we say that the name main points to commit H. But commit H contains the hash ID of an earlier, or parent, commit: let's call that one G. We say that H points to G, and we can draw that:

        <-G <-H   <--main

Since G is a commit, it has one of these points-to pointers sticking out of it, too. By reading commit G's metadata, Git can find the raw hash ID of its parent; let's call that commit F:

... <-F <-G <-H   <--main

So main points to H, which points to G, which points to F, which points to ... well, this goes on until we get back to the very first commit ever—commit A perhaps—which, being first, can't point backwards and therefore simply doesn't.

What this means is that instead of one hash ID, each commit stores, in its metadata, a list of previous-commit hash IDs. The list can be empty, and is for that first commit. It can also have more than one hash ID, but we won't cover this case here. Most commits in most repositories are "ordinary" commits and have exactly one parent, though.

Your "working tree"

A repository, then, stores names—branch names for instance—that help Git find commits for us (we only have to remember the branch names), and stores commits that then store files. But the stuff in the commits (along with the actual commits themselves) is all completely read-only. Git must do this to make the hashing scheme work. What good are stored files if we can't write on them? Moreover, only Git can read them, so what good are they if we can't even read them?

This is where your working tree comes in. Most Git repositories have a working tree.3 The working tree of a repository is, quite simply, where you do your work. And, as we saw earlier, if you use git init in some directory to create a new, totally-empty repository and then make an initial commit:4

mkdir new
cd new
echo example > README.txt
git init
git add README.txt
git commit

you will wind up with a hidden .git folder here in the new/ folder we just made (mkdir new) and entered (cd new). The working tree for this Git repository in new/.git is new/, and the file we created—README.txt—in that working directory is now also stored in the first (and so far only) commit in that repository.

If we now modify the one file, and/or add a new file, and use git add and git commit appropriately, we'll get a second commit that stores (forever5) the new versions of that file. That second commit has, as its parent commit, the first commit, which stores (forever) the earlier version with just the one file in it.

The second commit is now our current commit, and is now the last commit on the main or master branch (whatever its name is).

Git allows us to check out any commit we have stored in the repository. When we do that, Git will erase from our working tree the files that go with the current commit. It will, instead, install into our working tree the files that go with the newly selected commit—which then becomes the current commit.

In this way, we can "go back in time", any time we like, to any older version, stored as a commit in the big database. All we have to do is find its commit hash ID (for which git log comes in handy, for instance). That's not what we'll focus on right now though.


3The exception here is a so-called bare repository. We won't cover these here.

4These are Unix-shell-style commands as I don't use Windows myself, but this should work in git-bash, which is just a port of to Windows for use with Git. You can do all this in PowerShell or even CMD.EXE instead, but some command details might change.

5Well, forever, or as long as the commit itself continues to exist. If we remove the commit, we remove its snapshot. This is actually kind of hard to do! However, if we remove the repository proper, we destroy the two databases, which removes all commits, and this is pretty easy to do.


"Nested" repositories: the thing you didn't want, but made

Given that the computer—the host operating system, which is in your case Windows, but this is also true of macOS and Linux—demands and uses a tree-structured file system, we can set up a structure like this:

java
  .git
    <various Git repository control files and databases>
  hw1
    .git
      <various Git repository control files and databases>
    main.java
  hw2
    .git
      <various Git repository control files and databases>
    main.java

and so on. Here we have one repository per hw directory plus one overall containing repository in the java directory.

But here's the problem: Git literally cannot store a Git repository inside a Git commit.6 Instead of doing so, the "outer" repository—in this case the one in java/.git, whose working tree is the java/* files—will store what Git calls a submodule using what Git calls a gitlink. To store a submodule correctly, you must use git submodule add, not git add; git add creates or updates only the gitlink, which is sort of half a submodule.

If someone does want submodules (but you don't), this git submodule add method is how to make them. The result is that when you clone the java repository, you get files, plus the magic gitlinks, that Git will need in order to run additional git clone commands, one for each submodule. This way, the person who clones the java repository can run git submodule update --init to run a bunch more git clone commands. But again, that's not what you want.


6There are some tricks to get around this problem if you really need to do it, but it's not a good idea in general. The recent safe.directory stuff is an outgrowth of a security issue that resulted in a CVE when someone discovered such a trick. The tricks that Git allows involve renaming the .git directory; the ones it doesn't allow, or accidentally allowed in the past, result in CVEs.


Fixing the mess

The observations we should make at this point are these:

  • Git stores commits. It doesn't store files (though commits do store files). It stores commits.
  • What you want is a single repository with multiple commits, where the first commit—or maybe second; see below—contains a file named hw1/main.java,8 but no files named hw2/whatever.
  • What you have now are multiple repositories: one, a superproject, with submodules (or half-submodules) named hw1, hw2, and so on, and then more repositories that get cloned into hw1, hw2, and so on, each containing a main.java and whatever other files.

Now, if we assume (or you verify) that you do not need to save any of the commits in any of these repositories so far, what we can do is simply delete all the .git folders and their contents.

That is, on a Unix-like shell, we would run:

cd java
ls               # make sure we're in the right place
rm -rf .git      # remove this working tree's Git repository
rm -rf hw1/.git  # remove the Git repository in hw1/
rm -rf hw2/.git  # and so on ...

Note that we're using the OS's remove command, with the "remove everything without asking" options, on the hidden Git folders. Git has no opportunity to stop us: we're totally bypassing Git here. All of Git's files, including the two big databases, get completely removed. This is likely to be irrecoverable (depending on your OS and whether you're using the OS's "remove irrecoverably" command, or its "move to trash so I can get it back if I change my mind" command, and also depending on whether you have good backups, e.g., macOS Time Machine).

We now have only all of the working trees, with no .git folders: there are no repositories left. But all of the files are still there because the checked-out files were, and still are, in the working trees.

Now we create one new, totally-empty repository in the java directory, that we're still in:

git init
[Git prints message: Initialized empty Git repository in ...]

We now have our initial, totally-empty repository. I like to create a first commit that contains just a README.txt (and maybe one or two similar files):

echo repository for "insert class name here" > README.txt
git add README.txt
git commit -m "Initial commit"

We're now ready to "complete" homework assignment #1:

git add hw1
git commit
(write a good, proper commit message in editor)

By running git add hw1 when there's no Git repository inside hw1, we add all the files that are in hw1 (including any files in any subdirectories inside hw1).

The git commit command commits what's been stored so far, as updated by our git add. So when we commit the addition of hw1, we get README.txt—which we didn't change, so this commit literally re-uses the previous version of the file—plus all the hw1/* files.

We can now "complete" homework assignment #2 with git add hw2 and committing, and so on. We end up with a single repository in the java/.git directory, containing multiple commits: an initial one with the README file and subsequent ones with each homework assignment added. There is just the one branch name and it holds the hash ID of the last commit.

Pushing this to GitHub

Your last problem here is that if you have already created a GitHub repository and put some commits in it, your existing GitHub repository is going to be reluctant to lose those commits. You have several options:

  • You can keep those commits, if you really want to.
  • You can tell GitHub to completely delete that repository, then create a new one with the same name.
  • Or, you can use git push --force from your laptop (or other computer) that has your new repository, so as to command the Git software on GitHub to go ahead and lose the old commits from the old repository.

The general idea here, with the last option, is that we (and Git) find commits by starting from some branch name like master or main. That gives us the hash ID of the last commit, and from there, we have Git work backwards.

Suppose we command (not just ask) some GitHub repository to take a new chain of commits. That is, they had:

A <-B <-C   <--main

We now make a totally new (empty) repository, and put in two commits: an initial commit D and a second commit E, neither of which have the same hash ID as any of those three commits in the original repository:

D <-E   <--main

We run git remote add origin url to set things up so that we can git push to GitHub. If we run:

git push origin main

our Git will send commits D and E to GitHub, then politely ask if they can add commits D-E to their repository. But that would give them:

A <-B <-C

D <-E   <-- main

which, they notice, will mean they no longer have any name by which to find commit C, which means they'll "lose" all three hash IDs. So they will say No! If I do that I'll lose my access to some of my commits!

Your Git software reports this as ! [rejected] main -> main (non-fast-forward): it means they are saying they could lose commits. But that's exactly what you want: you want them to lose A-B-C; those commits are no good! So you can use git push --force origin main, which sends D-E again but this time commands them to make their main point to E.

You have to have permission—GitHub add a whole set of permissions that base Git fails to provide—but if you own this GitHub repository, you probably will already have the right permissions.9 So they'll obey: they will make their branch name main point to commit E, and "forget" commits A-B-C.10


8Note that while your OS demands folders with subfolders and files, Git just stores "files with long names that have slashes in them". Git understands the folder-y requirements your OS makes, and can turn hw1/main.java into "file main.java in folder hw1. It will automatically save the OS's hw1/main.java—a file named main.java in a folder named hw1—as the Git file named hw1/main.java.

Normally, you don't need to worry about this whole mess. The time when you do have to worry about it is when you want to store an empty folder in Git, because Git literally can't do that. Git only stores files. There are some tricks for this though: see How can I add a blank directory to a Git repository?.

9If you own the repository, the only way you wouldn't have permissions is if you logged on to GitHub and told them to deny permission to yourself. To fix that, log on to GitHub again and tell them to give permission back to yourself.

10"Normal" Git setups really do eventually forget (or lose) commits this way. GitHub, however, have their software set up to retain all commits forever. So if you send a bad commit to GitHub, and for whatever reason, you really need it removed, you must contact GitHub support and get them to scrub it off their systems.

torek
  • 448,244
  • 59
  • 642
  • 775