0

I have been asked to do this on git hub

"It looks like you didn't configure your git correctly so your commit is associated with an anonymous user:"

I am new to open source and git can anyone please help me out also after making the changes in the git settings do. I need to git push it again or only need to git commit it.

  • Which git client are you using? If you are using the git cli, then see here: https://stackoverflow.com/questions/46941346/how-to-know-the-git-username-and-email-saved-during-configuration. See the top answer first and then the question. – TheIceBear Oct 04 '21 at 19:22
  • I have completed whatever was listed their, Now do I need to push my changes again to the repo? – Ayush Singh Bhadoria Oct 04 '21 at 19:28
  • Future commits will have the right configuration. I would not ask you to commit again if I was the maintainer on the repo you are working on. It doesn't make sense to commit again unless you first delete what you already pushed. Ask the same person if you are unsure. – TheIceBear Oct 04 '21 at 19:35
  • The owner of the repo has asked me to commit again because I haven't abided by the commits messages guideline. Now my question is in my local machine I, already have an updated file which I made 2 days ago, so when I open my terminal do I only need to choose that repo using cd and git commit -m "---" and then git push or there's something else I should do. what will happen to that old commit then? I am just completely perplexed. – Ayush Singh Bhadoria Oct 04 '21 at 20:28
  • It might be the configuration on git and the git hub is not match each other. Could you please confirm that and check again... – Xab Ion Oct 04 '21 at 20:42
  • sorry I don't get it ? – Ayush Singh Bhadoria Oct 04 '21 at 20:45
  • It could be helpful for you. https://crunchify.com/how-to-git-change-commit-email-and-username-fix-unrecognized-author-commit-message-on-github/ – Xab Ion Oct 04 '21 at 20:48

1 Answers1

1

You seem to have multiple questions here, possibly including:

StackOverflow guidelines say that you should stick to one question per Q&A, so let's focus on this one:

The owner of the [GitHub] repo has asked me to commit again because I haven't abided by the commits messages guideline. Now my question is in my local machine I, already have an updated file which I made 2 days ago, so when I open my terminal do I only need to choose that repo using cd and git commit -m "---" and then git push or there's something else I should do. what will happen to that old commit then? I am just completely perplexed.

Git is a big system and there is a lot to know about it, but to get you started, let's do a quick overview:

  1. Git is all about commits. Git is not about files, although commits store files. Git is not about branches either, although branch names help us find commits. The commit is the reason Git exists: everything else Git does is in service of finding, using, and making new commits.

  2. To achieve this, Git itself is largely a system comprising two databases.

    One—usually the biggest by far—contains commit objects and other supporting objects. These objects, including the commits, have numbers, but the numbers aren't simple counting numbers: they don't go "commit #1", then "commit #2", then 3 and 4 and so on. Instead, each commit gets a unique, but very large, number, expressed in hexadecimal, that looks random (but really isn't).

    These random-looking numbers are pretty useless to humans, who can't keep them straight. So Git provides a second, usually much smaller database, that maps names—such as branch names—to commit hash ID numbers.

    There's a peculiar thing about these databases. When you use git clone, you make a copy of the big database, with all the commit objects and other objects. You, with your Git repository, will eventually share commits you store in your database with them—with their Git repository and their commit database. So the commits get shared. But you don't actually make a copy of the names database. Instead, you get your own separate names database, with different names in it. These names do not get shared, at least, not exactly.

Again, what you're going to do is make new commits. You already made one. They don't like something about that one commit, so you're going to have to make at least one more. You also mentioned that you:

already have an updated file ...

which I take to mean updated since the last commit you made.

Because you need to make another new commit, you now have a choice:

  • You can take your existing commit, which the other repository owners don't like, and make a new-and-improved version of that commit that only fixes the name-and-email-address part. Then you can add on, to that commit, another commit that updates this "updated file". That gives you three total commits, of which you want just two to go into the other repository.

  • Or, you can take all your updates from the commit they don't like, and roll this updated file in as yet another update and make a single new commit that updates the files the way you did before except that it also includes this new update. When you make this new-and-improved commit, you'll also fix the name-and-email-address part. That gives you two total commits, of which you want just one to go into the other repository.

This part is your choice, and whether and when to make two replacement (new-and-improved) commits for your one existing commit, or just to make one replacement (new-and-improved) commit for your one existing commit, tends to be a matter of opinion. You and the owners of the other (open source) repository may have different opinions here too; if so, that's something you'll need to resolve. Git provides only the mechanisms here, not the opinions.

What to know about commits

First, you need to remember something you've already read here. Every commit has a unique number. This big ugly hash ID or object ID number is too difficult for humans to remember the actual number, but you should recognize things that are commit hash IDs. They show up in git log output, and you can use your mouse, or whatever, to cut-and-paste them. Since each commit gets a unique number, you can then easily find this commit in your clone, if your clone has it.

For instance, if I run git log in a Git repository for Git itself, I will eventually see ebf3c04b262aa27fbb97f8a0156c2347fecafafb. (That's the commit hash ID for the commit that represents Git version 2.32.0.) This number is now forever reserved for that commit, so you won't ever see it in any repository that isn't a repository holding the commits that make up Git itself.1 If I run git show on that number, I will see that commit in a clone of the Git repository for Git, but get an error (bad object) if I try that in some other Git repository.

The next things to know about a commit are these:

  • Commits are permanent (mostly) and read-only (completely). This means you cannot change any commit ever, and commits are pretty darn difficult to get rid of. You can make it so that you can't find them, and once commits are truly un-findable, and enough time passes, Git decides that nobody cares about that commit after all, and then it goes away for real. (If you can run git log without using the raw hash ID of the commit, and then find the commit, it never goes away. We call this kind of commit reachable, or more loosely, find-able. Most commits are find-able and therefore permanent.)

  • Each commit holds a full snapshot of every file, much like a tar or zip archive. That's how—and why—Git can get your old versions of files back.

  • The files inside a commit, however, are not ordinary files. They're in a special read-only, Git-only, compressed and de-duplicated form. This makes them so that only Git itself can read them, and literally nothing can write them—which makes them quite useless for getting any actual work done.

    The compressing-and-de-duplicating, though, is very useful. It means that even though every commit contains every file, new commits—which mostly use the same copies of files that were in older commits—take almost no space. The new "copies" of files that are the same as older ones just literally re-use the older file.

  • Besides holding a snapshot, each commit also holds some metadata: stuff like the name and email address of the author of the commit. Like the snapshot, this metadata goes into the commit at the time you (or whoever) make the commit. From then on, it's all read-only.

    This means that the existing "bad" commit you have has the wrong name and/or email address in it. You literally can't fix it—nobody and nothing can change any existing commit—so you will have to make a new and improved commit, and get your Git to use that one instead.

    One piece of that metadata is something Git sets by itself, for itself. Every commit stores the raw hash ID of some set of earlier commits. We call those the parents of this commit. Most commits store exactly one parent commit hash ID. We call them ordinary commits, to distinguish them from merge commits—which hold two parent commit hash IDs—or the root commit, which, being the first commit someone ever made in a new empty Git repository, doesn't have a parent.2

This parent stuff is crucial to Git itself. You often don't need to care about it—Git does that automatically for you—but Git does. It has the effect of linking commits together, albeit backwards.

Suppose we have a tiny repository with just three commits in it. These three commits have big ugly random-looking hash IDs, but to make things easier for ourselves, we'll just call them commits A, B, and C, and say that we made them in that order. So A is the root commit and doesn't point backwards at all. Commit B has commit A as its parent, though: B points backwards to A. Commit C is our most recent commit, and has B as its parent, so C points backwards to B.

We can draw this situation like this:

A <-B <-C

Note how C points to B, which points to A.

Each commit has a full snapshot of every file. By extracting the files from both B and C, Git can now compare the two sets of files. Since the files are de-duplicated, Git can easily tell which files in B and C are the same. So Git can immediately tell us which files we changed from B to C. By comparing the contents of those files, Git can show us a diff: a recipe by which we could reproduce the change. In most cases, that recipe will match the actual change we made (though there are some occasional edge cases where Git shows us something else that just winds up doing the same thing).

This picture is pretty simple and useful, but has one big problem: the actual hash ID of commit C isn't just a simple single letter like C, but rather some big ugly thing like ebf3c04b262aa27fbb97f8a0156c2347fecafafb. This is where branch names come in.


1Technically, the number could be re-used in an unrelated repository. The numbers were intended to be so big as to be always unique, and in practice they generally will be, but by partially breaking SHA-1, people have set things up so that Git is going to have to switch to even bigger hash IDs in the future. C'est la vie.

2Technically, a merge commit has two or more parents, and there can be more than one root commit. But these are both unusual situations, and not something you need to worry about here.


Branch names help us (and Git) find commits

Given a bigger repository—with at least eight commits, say—we might draw part of it like this:

... <-F <-G <-H

Since those arrows always point backwards, and get annoying to draw in text, I tend to get lazy and draw them like this instead:

...--F--G--H

Here H stands in for the hash ID of the latest commit: the one we just made on our branch main, for instance, or the latest one we got from someone else, or whatever.

I don't want to try to memorize a Git hash ID, and I doubt you do either. But we don't have to: we have a computer. We can just have Git do that for us. Let's have Git store hash ID H in the name main, making main point to commit H, like this:

...--F--G--H   <-- main

Now, let's make a new branch name. In Git, each branch name must point to some existing commit. We could pick F or G here, but why would we want to use an old commit? We might have a reason to do that someday, but I don't have one here, so I'll use H here too. I'll make the name feature point to commit H:

...--F--G--H   <-- feature, main

Now we need a way to remember which branch name we're using. Git will do this for us too, by attaching the special name HEAD to just one of the branch names. So if we're using main, we draw that like this:

...--G--H   <-- feature, main (HEAD)

(I got even lazier and stopped drawing F, but it's still in there, as are all the earlier commits. We're just not going to bother finding them, although git log would.)

If we now switch to branch feature, using git checkout feature or git switch feature, this is what happens:

...--G--H   <-- feature (HEAD), main

The name HEAD is now attached to feature. Nothing else happens because both names—feature and main—pick out the same commit right now, namely commit H.

Git's index and your working tree

I mentioned earlier that the files inside a commit are in a special, read-only, Git-only, de-duplicated form. To actually use those files, Git has to copy them out of the commit. These are the files that we see and work with. They exist within directories (or folders, if you prefer that term) as required by our computer's filing system.3 Git calls this our working tree, because that's where we actually get our work done.

For some reason,4 Git also stores extra "copies" of each file in what Git calls the index, or the staging area, or—rarely these days, the cache. I put copies in quotes here because the files in this area are in Git's internal de-duplicated format. The stuff that's in here acts as your proposed next commit. I won't go into any real detail here, but this is another way Git confuses newbies; it's why you have to git add files so often, rather than just adding a new file once and being done with it.

The end result of all this is that there are actually three copies of each file active at all times:

   HEAD        index      work-tree
---------    ---------    ---------
README.md    README.md    README.md
lib.c        lib.c        lib.c
prog.py      prog.py      prog.py

or whatever. The "HEAD" copy is the one in the current commit: Git finds the current commit by reading HEAD to see which branch is the current branch, then reading the branch name to find the commit. That file literally can't be changed, though you can switch from one commit to another. The index copy can be replaced wholesale, using git add. Your working tree copy is the one you work on/with. When you've updated it, you run git add to get Git to swap in a new, pre-compressed, pre-de-duplicated copy (or "copy" if it's a duplicate) of the file, ready to go into the next commit.

When and if you do switch from one commit to another, Git has to remove, from its index and your working tree, all the files that came out of this commit. Git can then plug in all the files that come out of the new commit you switch to. This happens when you use git checkout or git switch and it requires changing commits.


3Git's own internal stored files inside commits don't use folders, or at least, not in this way: Git uses an entirely different system, which is why only Git can read these archived files.

4Other version control systems don't do this, which acts as proof that Git didn't have to do this. But Git does this.


We didn't change commits, so nothing happened yet

We started with:

...--G--H   <-- feature, main (HEAD)

and then switched to feature:

...--G--H   <-- feature (HEAD), main

We're still using commit H, so Git didn't change anything yet.

Now let's modify some file(s) in the working tree and run git add. This will update the proposed next commit. Then we'll run git commit.

The git commit operation will:

  • Package up all the files as they appear right now in Git's index: since we git add-ed our changes, those are our updated files, except where the index still has files we didn't git add, in which case they're duplicates of what's in commit H.

  • Add the metadata for the new commit. Git will get our name and email address from the git config settings. Git will get the current date and time from the computer's clock. And, crucial for Git itself, Git will find commit H's real hash ID—the actual big ugly number—and use that as the parent for the new commit.

Git will now write all of that out, which makes a new commit with a new, unique, big ugly hash ID, which we'll just call I for short:

...--G--H
         \
          I

Now comes the special trick: having made a new commit, Git simply writes the new commit's hash ID, whatever it is, into the current branch name, as found by whichever name HEAD is attached to. So now main still points to existing commit H, but feature points to new commit I:

...--G--H   <-- main
         \
          I   <-- feature (HEAD)

We have a new commit whose parent is the old commit. Our new branch name now selects the new commit. If we now:

git checkout main

we'll get:

...--G--H   <-- main (HEAD)
         \
          I   <-- feature

Git will take out the files from commit I and put in, instead, the files from commit H. Our updated files still exist, and can be found by using the name feature to find commit I. But the files we see and work with are now those from commit H, as found by the name main.

Replacing a bad commit

Suppose there's something wrong with new commit I. This could be a wrong file in the snapshot, or a bad author name or email address, or both. We can't fix a bad commit, but we can always make a new commit.

If we just check out feature and make a new commit, this new commit will add on to the existing commit, just like before:

git checkout feature
... do some work, maybe run `git config` too/instead ...
... git add files if needed ...
git commit

and we'll get:

...--G--H   <-- main
         \
          I--J   <-- feature (HEAD)

The bad commit will remain. What can we do?

Well, one trick is to make a new branch name but start it at commit H again:

...--G--H   <-- main, redo (HEAD)
         \
          I   <-- feature

Now we can re-do whatever we did to make I, but do it right this time, and add and commit:

          J   <-- redo (HEAD)
         /
...--G--H   <-- main
         \
          I   <-- feature

We can now delete the name feature:

          J   <-- redo (HEAD)
         /
...--G--H   <-- main
         \
          I   ???

Commit I still exists, but we can't find it any more. The git log command won't show us its hash ID.4 Then we can, if we want, rename redo to feature, and it's as if we did things right all along.

There's a slightly easier way to do this though, using git commit --amend. The --amend option does not change a commit: that's literally impossible. What it does is change the way Git makes the new commit.

Normally, with:

...--G--H   <-- main
         \
          I   <-- feature (HEAD)

a new commit adds J to the end of the current branch. The parent of J is I and the name feature now points to J.

With --amend, the name feature will point to our new commit J, but instead of pointing to I, our new commit J will point to existing commit H. Git simply finds the parent of the current commit I, and uses those for the parent of the new commit J. So we get:

          I   ???
         /
...--G--H   <-- main
         \
          J   <-- feature (HEAD)

Note that this only "ejects" the last commit; if we have:

...--H--I--J--K   <-- branch (HEAD)

and run git commit --amend, we get:

             K   ???
            /
...--H--I--J--L   <-- branch (HEAD)

where commit K no longer has any way to find it.

To do much fancier operations, you'll want git rebase, perhaps in its form as git rebase -i. This means copy many commits, and then use the new and improved copies instead of the originals. It gets very complicated, though, so we won't cover that here.


4This isn't completely true, but it is how Git will eventually get around to dropping commit I entirely.

torek
  • 448,244
  • 59
  • 642
  • 775