0

I was working on a project with a friend, however instead of cloning the repo I downloaded it as zip and made several changes. In the mean time the remote one got update by my friend. Now, my code is outdated and I can't commit my changes as github don't recognize the diffences. I tried cloning into a different location than copied the modified files into the cloned one however, this way I can commit my changes but with the outdated code( not the new changes made by my friend). Any suggestion would be appreciated ?

khalil
  • 15
  • 5
  • Hi khalil. Could you include the output of error messages you are getting? Also, could you detail in your queston whether or not the .zip contains the `.git` directory? – tjheslin1 Oct 15 '21 at 17:14

2 Answers2

2

A zip file is not a Git repository, and cannot be used as one.

A Git repository is, at its heart, really a big collection of commits. Each commit is like an entire zip file of source files. So a repository is, in effect, an ever-expanding set of many zip files.

What you can do, to deal with this "correctly"—which may be slightly to very painful, depending on what you did with the original zip file and/or your overall programming ability—is this, which starts with what you did try:

  1. Keep the files you have now somewhere, out of the way.
  2. Use git clone to make, for yourself, your own Git repository. This clone will be filled with all the commits that are in the repository you're copying.
  3. Somehow find the original commit from which you had Git make a zip file. Create, for your own use, a new branch that selects this commit.

We'll come back to step 3 in a while, but first we should talk more about commits. The brief one-sentence description above ("like a zip file") is not wrong, but doesn't capture the true essence of a commit. (If you're impatient and already know all this, scroll to the end.)

What a Git commit is

Each Git commit is:

  1. Numbered. Every commit anyone ever makes, anywhere, gets a unique number. To make this work, the numbers are huge, and seemingly random (although they're actually just outputs from a cryptographic hash function). They're quite useless for humans, but the commit number is how Git finds a commit, so Git needs them. Git calls these hash IDs, or more formally, object IDs.

  2. Made up of essentially two parts. The two parts of a commit are:

    • all the files, as if in a zip archive (but stored quite differently); and
    • some metadata, or information about the commit itself: who made it, when, why (their log message), and so on.

Inside the metadata, Git keeps, for itself, one crucially important piece of information: each commit remembers the raw hash ID of some previous commit. Actually, it's IDs, plural, of a set of commits, but most commits have just one in here. Git calls that remembered hash ID the parent commit, and the commit itself is a child of that parent.

Because the hash IDs are cryptographic hashes of the complete contents of each commit, it's impossible to change anything about any commit after it's made. And, because the hash IDs are unpredictable—they include such things as the time at which the commit is made—it's impossible to include the hash ID of a future commit in any commit we make. So commits necessarily remember only their parents, never their children.

The result of all this is usually a simple, linear chain of commits:

... <-F <-G <-H

where H stands in for the actual hash ID of our latest commit, whatever that is. Commit H contains, in its metadata, the raw hash ID of earlier (parent) commit G, so we say that H points to G. Meanwhile, G is also a commit, so it also has metadata, which contains the raw hash ID of its parent F: G points to F. F in turn points backwards to some still-earlier commit, and so on.

This is the history in a repository, which is nothing more than the commits in the repository. All (all?) we have to do is somehow find the latest one: H in the drawing above. But note that history can diverge:

          I--J   [commits you might make starting from H]
         /
...--G--H
         \
          K--L   [commits your friend makes, also starting with H]

Which of these commits is "the latest"? The answer is really both: J is your latest, and L is their latest.

This is one form of branching. Now, it's true that you'll be making your commits in your clone, and your friends will make their commits in their clones, but at some point, someone has to reconcile all these clones. There are many tools for dealing with this but we won't even start on them here; this is really just to point out the many meanings of the word branch. This word is badly over-used in Git, but to some extent, we're stuck with it.

A better overview of a Git repository

I already said that a repository as, at its heart, a collection—a database—of commits, and that's true, but again doesn't capture how we'll use this. In fact, a repository is more like several databases, one of commits and other internal Git objects, plus one other big one and a lot of smaller ones as well.

Since humans are bad at remembering hash IDs, Git gives us an easy way out of this: it provides a names database. These names include, but are not limited to, what Git calls branch names. A name, in Git—a branch name, or a tag name, or the things Git calls remote-tracking branch names that I call remote-tracking names (since they're not actually branch names at all)—each name in Git serves the purpose of storing one hash ID.

That's all we need! One hash ID suffices. When the name is a branch name, that one hash ID is, by definition, the latest commit "on" that branch:

          I--J   <-- my-feature
         /
...--G--H   <-- main
         \
          K--L   <-- bob-feature

Here, commit H is the latest commit on main. It's not the latest commit ever: I, J, K and L are all later. But it's the latest on main, and that's how a Git branch is defined. Commit J is the latest on my-feature.

The actual set of commits "on" some branch is all the commits we can find by starting at the end and working backwards. So commits up through H are on all three branches. If you are used to other version control systems, this idea that commits are on many branches all at the same time may be downright weird. But that's how Git works.

The other thing about branch names is that they move. If commits I-J seem right, we can make them be on main now by moving the name main forward along the I-J line:

          I--J   <-- main, my-feature
         /
...--G--H
         \
          K--L   <-- bob-feature

Now all commits up through J are on the two branches, while commits K-L are only on bob-feature. Or, if this was a mistake, we can force the name main to move back two steps to H again.

So this tells us how we use branch names in a Git repository: they help us—and Git—find commits, by finding whichever commit we want to claim is the latest for that branch. The commits themselves do not, and cannot, move: they are all set in stone. (We can change how we draw them: there's no reason we have to put my-feature on the top row, for instance, or we can draw vertically with newer commits higher up, or lower down, or whatever we like. But the commits themselves are actually immutable.)

Your working tree and the index

If a commit holds a snapshot and is immutable—and it does and is—how are we to get any actual work done? In fact, the files inside a commit are not only frozen for all time and compressed (as they would be in a zip archive), but are also de-duplicated across the entire repository contents, and are in a form that only Git itself can read. So, just like any archive, we have to have Git extract the files from a commit before we can use them.

A normal repository therefore provides a work area—which Git calls a working tree or work-tree—where you can do your work. When we check out a commit, Git fills in this working tree from the files saved in the snapshot.

Since most of what we do with Git involves making new commits—adding more history to the repository—you will now generally modify some of these files, and maybe create new files and/or remove existing files, and then you will want Git to make a new commit from the updated files. In most version control systems, this is straightforward: you just run their "commit" verb. Git is not straightforward here.

For various reasons, which you may someday agree with or not, Git now imposes upon you this thing that Git calls, variously, the index, or the staging area, or (rarely these days) the cache. Why there are three names for this thing is a bit mysterious: I think it's because the original name index is poor, and the name cache is worse. The name staging area at least reflects how you use it, most of the time. But it isn't all that descriptive. My own one-line description for Git's index is that the index holds your proposed next commit.

In other systems, when you use their commit verb, they look at your working tree to see what you've done. Git, instead, looks at its index. Whatever files are in Git's index, those are the files that go into the commit. This means that the index effectively holds copies of the files that will go into the commit.

Git's internal format de-duplicates files. This is pretty central to making Git efficient: without it, since every commit holds a full copy of every file, your repository would rapidly become obese. But most commits mostly re-use the previous commits' files. By storing just one copy, read-only and compressed—eventually super-compressed—Git keeps the storage requirements reasonable.

Meanwhile, what's in Git's index, aka the staging area, is in this compressed and de-duplicated format. The difference between an index copy of a file and a committed copy is that you can have Git replace the index copy (deleting it, and putting in a different compressed-and-de-duplicated copy instead). The commit can't be changed, but the index can.

So, when you first check out some commit, making it become the current commit, Git fills in your working tree—but also fills in its index, from that commit. Now your proposed next commit matches the current commit.

As you modify working-tree copies—which Git doesn't use—they gradually become different from the index copies. The index copies match the current, or HEAD, commit copies. At some point though, you're ready to commit some file or files. At this point you must run git add on the files.1

What git add does is simple, once you know about the index. It:

  • reads the working tree copy of the file;
  • compresses it and checks for duplicates; and
  • updates Git's index appropriately.

If the file is a duplicate, Git tosses out the compressed copy it just made and re-uses the old one instead. If it's not a duplicate, Git arranges for the file to get stored forever once the commit gets made, and updates the index with that new internal object. Either way, the index copy now matches the working tree copy, except that the index copy is ready to be committed. Or, to put it another way: git add updates your proposed next commit.


1You can, if you like, use git commit -a as a shortcut. If you're new to Git, this is tempting. Don't do it! It's a trap! It lets you avoid thinking about Git's index, but eventually Git will slap you in the face with some surprising aspect of the index. You need to have Git's index on your mind, even if it's just a sort of background presence.

Still, it's worth mentioning that what git commit -a does is, in effect, turn git commit into git add -u && git commit. That is, first Git tries to update its index the way git add -u would. Then, once that succeeds, commit goes on to its normal action. There's a bunch of tricky stuff here though, having to do with pre-commit hooks and other issues. It's better to avoid git commit -a as a beginner, and once you're an advanced Git user, you'll often still want to avoid git commit -a for other reasons.


git status, untracked files, and .gitignore

Before we actually simulate a git commit it's worth a brief look at the git status command and what Git calls untracked files. Untracked files can be "ignored", which is kind of a misnomer. Tracked files—the files that are not untracked—can't be ignored like this.

Because your working tree is yours, and because it's just an ordinary directory (or folder, if you like that term better) on your computer holding ordinary files, you can do anything you like here without Git knowing what you're doing.

That, plus the fact that you have to run git add on files before Git even bothers to see that you have done anything, makes working with Git painful. To decrease the pain level, we have git status. What git status does is simple to describe once you understand what git diff does. (What git diff does is ... less simple if we were to cover all the details, but for now I'll just assume you know.)

What git status does, in part, is to run two git diff --name-status commands for you. The first one compares HEAD—the current commit—vs Git's index. It doesn't show the actual differences, but for any file that is the same, it says nothing at all, and for any file that is different, it says so.

This means that you can instantly tell what files you've changed in Git's index. Those files are different in your proposed next commit. If you do commit them now, they will be different in your new commit.

Files that aren't mentioned here must be the same in the current and proposed-next commits ... or, perhaps they're not in HEAD and Git's index at all. Perhaps they are all new files. If they are in Git's index, they will show up here as a "new file". This section of git status output lists these as files to be committed, and each one is either new, modified, or deleted: it's new in the index, or both HEAD and index have the file and they're different, or it's in HEAD but not in the index.

Having gathered up that list for you, git status now goes on to run a second diff. This time it compares the files that are in Git's index to the files that are in your working tree. Once again, we can have:

  • Files that are the same: they're in both, but match. Git says nothing about these.
  • Files that are different: they're in both both but don't match. Git says these files are not staged for commit.
  • Files that have gone missing from the working tree: they're in the index, but not visible to you. The index contents are not directly observable, but you can view your working tree contents with any ordinary file listing and viewing command, because those are ordinary files. This particular file isn't there any more, so Git says this file is deleted (but still not staged for commit).
  • Files that are all new: they're not in the index, but are in your working tree.

This last group of files has a special status. These are your untracked files. Any file that is in your working tree, but is not in Git's index right now, is an untracked file. The git status command separates out this last listing, of untracked files, from the first two:

$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   worktree.h

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   Makefile
        deleted:    zlib.c

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        newfile.x

Here, I modified worktree.h and ran git add on it. So the HEAD and index copies are different, and we see that in the Changes to be committed section.

I modified Makefile but didn't git add this, removed zlib.c and didn't git add the removal, and created an all new file newfile.x and didn't git add the file. So in the section titled Changes not staged for commit, Git lists Makefile as modified, and zlib.c as deleted. But it doesn't list newfile.x as added here. Instead, that's down in the Untracked files section.

The untracked files here are separated out primarily for one reason: many things for which we use Git create a lot of untracked files. We need a mechanism by which we can tell Git two things:

  • don't complain about this file, and
  • if I use an en-masse git add . or similar to just add everything, don't add this file either.

(We haven't really covered the en-masse "add everything" operations but they're very handy, once git status shows the right stuff. We can add everything, or everything in some particular sub-directory, or whatever. Git won't add any untracked file that we also told it to shut up about.)

That's a lot of detail to absorb, so let's stop here and move on to the next section.

Making a new commit from Git's index

Once you've arranged Git's index the way you want, so that git status prints what you want it to print—the files you intend to show up, do show up in the to be committed section, and they have what you want them to have in them and don't show up in the not staged for commit section—you can simply run:

git commit

Git will now collect from you all the metadata that needs to go into the new commit that it needs to get from you. In particular, Git will:

  • read your user.name and user.email settings to decide what to put into this part;
  • use your computer's clock to figure out what time and day it is, for the time-and-date part;
  • collect a commit log message from you, typically by opening up an editor on .git/COMMIT_EDITMSG (you can use -m to short-cut this); and
  • use the current commit's hash ID as the parent of the new commit.

Git will also turn all the files that are in the index into a new frozen-for-all-time snapshot to put in the new commit, and will then write all of this out as the new commit, which gets a new and unique hash ID.

Now, let's suppose that, at this point, we have this situation:

...--G--H   <-- main, my-feature (HEAD)

That is, we have two existing branch names, main and my-feature, that both select commit H. We're using the name my-feature. That means the current commit is commit H. Git will have filled in our working tree and its index from whatever is in commit H. Since then, we updated Git's index.

The git commit command has now taken the index contents, frozen them, added the necessary metadata, and written out a new commit, which got some new hash unique hash ID, but we'll just call it "commit I" here:

...--G--H   <-- main
         \
          I   <-- my-feature (HEAD)

The last step of git commit is that it writes I's actual hash ID, whatever that is, into the current branch name. Since HEAD is attached to the name my-feature, that's the branch name that gets updated. So now the name my-feature points to commit I. By definition, our new commit I is now the latest commit on branch my-feature.

Your branch names are yours; remote-tracking names remember theirs

We now come to one other place where Git is a little weird, compared to many other version control systems. On many systems, a branch name is a very solid thing, that lasts forever, and everyone who clones a repository uses the same branch names everywhere. That's not the case in Git! Instead, the branch names in a Git repository are specific to that one repository.

(They have to be, because of that trick where our new commit's hash ID just goes straight into the branch name. We can only update our repository, not anyone else's, at this time.)

So, when you run git clone to copy some repository to your own laptop, or wherever the clone is going, your Git copies all their commits, but none of their branch names. Instead of taking their branch names to make yours, your Git takes their branch names and renames them. Technically they become something that's not a branch name at all: it's a remote-tracking name instead.

If you call the other Git origin—that's the standard name for "the other Git", when there's just the one other Git that you git clone-d from—your Git will take their main and turn it into your origin/main. Your Git will take their feature and turn it into your origin/feature. Your Git will turn their branches into your origin/* remote-tracking names.

(Git calls these remote-tracking branch names, as I mentioned before. But they're not branch names at all. They're just your Git's way of remembering someone else's branch names. In that other repository, they don't have origin/ in front of them. That's why I just call them remote-tracking names: your Git is remembering some other repository's branch names, but not as branch names.)

Having copied all their commits, and turned all their branch names into remote-tracking names, your Git now has one problem: You have no branch names. What name will your Git use to attach HEAD to? There isn't any name!

The usual solution to this dilemma is that Git now creates one branch name in your repository. Which one? Well, that's what git clone -b is for: you tell your Git which name to create, based on one of their branch names. If you don't use -b—and most people don't—your Git asks their Git what name they recommend. That tends to be master or main (depending on who is hosting the Git repository you are cloning). So they recommend their main, for instance, and your Git now makes your own main from your origin/main, which remembers their main (whew!):

...--G--H   <-- main (HEAD), origin/main

Your Git now checks out this one branch name and all is normal: your current branch name is main and your current commit is whatever commit main selects. (In this case I've drawn it as hash H as usual.)

If they have other branches, your Git repository might look more like this:

       I--J   <-- origin/feature1
      /
...--G--H   <-- main (HEAD), origin/main
         \
          K   <-- origin/feature2

Each of your remote-tracking names exists to find their latest commit, just as each of their branch names, in their repository, exists to find what is for them the latest commit.

Later, you can run git fetch. When you do, your Git looks up theirs by name (origin: there's only the one other Git repository involved, so there's just the one standard name for it), calls up the URL listed under the name origin, and asks them what their branch names and latest commit hash IDs are. If those latest commits match the remote-tracking names in your repository, there's nothing to do. If not, your Git can get any new commits from them, using the hash IDs. Now your repository has all of their commits, plus any of your own that you still haven't given to them. Your Git now updates your remote-tracking names to remember their new latest commits.

We're finally ready to tackle your problem

Let's draw what you did:

  • Get a zip file, downloading it to your laptop. This is an archive made from some commit in their repository:

    ...--G--H   <-- main
    

    Your zip file thus represents commit H. It's missing the metadata, but it has all the files from the snapshot.

  • Extract the zip file. You now have all the files from commit H, but no Git repository.

  • Work on the files.

  • Discover the mistake, and clone the repository itself to your laptop.

You now have a repository somewhere on your laptop, plus a folder full of files from commit H, but modified, somewhere else on your laptop. The repository you have now might look more like this:

...--G--H--I--J--K   <-- main (HEAD), origin/main

What you want to do, for cleanliness purposes, is find which commit is commit H.

You can run git log, which will spill out commits one by one. If they have branches and merges this gets complicated, and you should read through Pretty Git branch graphs, but if not, you can maybe just search by date or something, to find commit H. The actual hash IDs will be large, ugly, and random-looking, so they won't help any. (To use them, you probably want to cut-and-paste with your mouse: it's really error-prone to try to type one in!)

There's a possible short-cut. If you still have the original zip file, look at its metadata. There is a file comment holding the actual hash ID. Grab it (with mouse or whatever) and you're golden! If not, how you find the right hash—the one I'm calling H here—is up to you. Another trick you can use is this: git diff can compare any commit to any file-tree, even one that's outside the Git repository. With git log, you get hash IDs; you can run:

git diff <hash> /path/to/unzipped/files

and get a diff listing. If the only changes you see are your changes, the hash here is probably H. You can use git log's dates to get a short-list of candidates for this kind of git diff, and then use trial-and-error to find the closest commit.

Assuming you have found hash ID H, all you have to do now is create a new branch name that points directly to this hash ID. To do that, use the git branch command:

git branch branch-xyzzy <hash>

(pick a better branch name, and use the mouse again for the cut-and-paste of hash IDs). Now you have, in your repository:

...--G--H   <-- branch-xyzzy
         \
          I--J--K   <-- main (HEAD), origin/main

You can now run git checkout branch-xyzzy:

...--G--H   <-- branch-xyzzy (HEAD)
         \
          I--J--K   <-- main, origin/main

The files in your working tree are now those from commit H. Copy the files in from the place you worked on the zip archive, use git diff and/or git status to figure out what to git add or just git add . and run git status, and you're ready to commit! Your new commit will get a new, unique hash ID, and the name branch-xyzzy will point to it:

...--G--H--L   <-- branch-xyzzy (HEAD)
         \
          I--J--K   <-- main, origin/main

or, equivalently:

...--G--H--I--J--K   <-- main, origin/main
         \
          L   <-- branch-xyzzy (HEAD)

Note how we can re-draw the drawing without actually changing any commits. Get used to the fact that graph drawings morph a lot: whatever graph-drawing software you use, such as the stuff built in to GitKraken—you tagged your question with —will have its own preferences. They may or may not match up with yours. The important things are the arrows from commits to earlier commits, and the various names that point to particular commits. The arrows from commit to commit can't change, because no part of any commit can change, but the arrows from names can.

We make use of the last bit a lot. For instance, now that you have:

...--G--H--I--J--K   <-- main, origin/main
         \
          L   <-- branch-xyzzy (HEAD)

you might want to use git rebase. This copies commits, to new and improved ones. Commit L might be fine, but it might be better if it built on commit K. You can't actually do that, but you can make a new and improved commit—let's call it L'—that does do that:

                   L'  <-- improved-branch-xyzzy
                  /
...--G--H--I--J--K   <-- main, origin/main
         \
          L   <-- old-branch-xyzzy

If we now delete the old name, making commit L hard to find, and use the name over again to point to L' instead of L:

                   L'  <-- branch-xyzzy (HEAD)
                  /
...--G--H--I--J--K   <-- main, origin/main
         \
          L   ???

and then use any commit viewer to look at the commits, it will seem as though we changed commit L. The new copy, L', has a different hash ID, and points backwards to K, but makes the same changes that H-vs-L would show. It has the same commit message that L had. So if we don't remember the hash ID—and nobody ever does—we might not even know this happened!

torek
  • 448,244
  • 59
  • 642
  • 775
0

No need to overcomplicate. Any folder can be a git repo, as long as it has the .git folder. I did the following:

  • clone the Github repo to another location
  • copy the .git folder to your downloaded zip folder using the command line (mv .git/ ~/new-location/.git/)

Now create a new branch, commit everything, merge in main/master and fix all conflicts. Then commit again and you're done.