3

I have a branch and wanted to git the changes in the master using $ git pull origin master After I do this, the pull really didn't show any merged PR being pulled and said it's already updated. However, the git log shows the last merged PR.

So how can I get the latest changes (a merged PR) on this branch?

Doing the $ git pull origin master on the master branch shows the merged PR being pulled.

How can I fix this problem? Since the README.md that I have merged its PR on the Github page, and also was able to use git pull origin master and pull it to master is not being pulled into this new branch.

$ git branch
  dataprocessing
  master
* toyota

When in the branch:

$ git merge master
Already up to date.

and

$ git branch -vv
  dataprocessing dcaa9f9 Merge pull request #122 from XYZaiXYZ/toyota
  master         dcaa9f9 [origin/master] Merge pull request #122 from XYZaiXYZ/toyota
* toyota         dcaa9f9 [origin/toyota: ahead 1] Merge pull request #122 from XYZaiXYZ/toyota

Additionally, the following yields no results:

$ git diff origin master

This is what I see in the README.md in local branch toyota: enter image description here

This is what I see in the README.md in GitHub PR which I merged: enter image description here

This is what I see when I browse to the actual README.md in GitHub website: enter image description here

This is what I see if I $ git checkout master, as you see even in master after pulled update the README.md is not changed: enter image description here

$ git checkout toyota
Switched to branch 'toyota'
Your branch is ahead of 'origin/toyota' by 1 commit.
  (use "git push" to publish your local commits)

$ git merge origin master
Already up to date.

$ git log README.md 
commit ac7cXXXX (origin/toyota)
Author: Mona Jalal <mona@XYZ>
Date:   Fri Feb 5 22:40:32 2021 +0000

    fixed two typos in the README.md

$ git pull origin master
From ssh://github.com/XYZaiXYZ/vision
 * branch            master     -> FETCH_HEAD
Already up to date.

I have merged #122 PR to master myself and I see this when I enter the git repo: enter image description here

$ git checkout master
$ git log
commit dcaa9XYZ (HEAD -> master, origin/master, origin/HEAD, toyota, dataprocessing)
Merge: 3b29485 ac7c61e
Author: Mona Jalal <76495162+XYZ@users.noreply.github.com>
Date:   Fri Feb 5 17:44:36 2021 -0500

    Merge pull request #122 from XYZaiXYZ/toyota
    
    fixed two typos in the README.md

I also did git clone the repo in a test dir and I can see the changes are shown in this new clone enter image description here

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
  • Are you on master in your local? Check your current branch using `git branch -vv` that shows mapping details of your local branches and remote branches. – Mohana Rao Feb 05 '21 at 23:56
  • I am in local branch called toyota – Mona Jalal Feb 05 '21 at 23:58
  • @MohanaRao added the results of `git branch -vv` to end of my post – Mona Jalal Feb 05 '21 at 23:59
  • As you are on a different branch, you don't see the PR commit from master. What you are seeing is mostly the commit you created on toyota branch that you had used for creating PR. Try `git checkout master` and `git pull origin` to see the PR commit. – Mohana Rao Feb 06 '21 at 00:07
  • even in the master branch the pulled README.md is not reflecting the changes. please check the updates in the OP – Mona Jalal Feb 06 '21 at 00:12
  • 1
    What do you see in `git log` on your local master ( `git checkout master` and `git log` )? – Mohana Rao Feb 06 '21 at 00:35
  • @MohanaRao thanks for your response, Please check end of the OP for an answer to your question – Mona Jalal Feb 06 '21 at 00:40
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/228332/discussion-between-mohana-rao-and-mona-jalal). – Mohana Rao Feb 06 '21 at 00:50
  • Things look normal except that your changes are not seen. Could you clone the repo into a separate folder and make sure your changes are not visible in the new clone as well? – Mohana Rao Feb 06 '21 at 01:02
  • @MohanaRao yes changes are shown in the new git clone however, I need to use the current one since I have run experiments and have processed lots of data – Mona Jalal Feb 06 '21 at 01:08

3 Answers3

1

git diff origin master yielding no result means your branch is the same as origin/master. so you have pulled the master from origin and your branch is up to date with the master branch.

Also, git merge master merges the changes on master if those changes have been committed locally. if the changes on master were committed on remote, you need to do git merge origin master to pull the master.

arianoo
  • 657
  • 14
  • 22
0

Let's take some definite items / hard facts into account here first:

  1. Git isn't about files, it's about commits.

  2. Commits are numbered, e.g., dcaa9f9 (seen in the git branch -vv output) or ac7cXXXX (seen in your git log output). These numbers—in hexadecimal—are hash IDs, so they aren't in any sensible order and not very useful for humans, but they are how Git really accesses each commit.

  3. The hash IDs are actually cryptographic checksums of the contents of the commit, which makes all parts of every commit completely read-only. Nothing can change in the commit once it's made. So in general we just add new commits to the repository, which is how Git stores history. The commits are the history.

  4. Commits store files, but not as changes. Each commit stores a full snapshot of every file—or more precisely, every file that Git knew about, at the time someone ran git commit to make that commit. (These are the tracked files: the untracked files are the ones that aren't in the next commit you'll make.)

  5. Commits also store metadata. This includes information about who made the commit, when, and why (the log message). In this metadata, Git stores, in each commit, some hash IDs. These are IDs of commits that existed at the time that you (or whoever) made the commit, so they're necessarily hash IDs of earlier commits. In general, most commits store exactly one hash ID: the very previous commit, from which this commit was made. Most of the remaining commits are merge commits, which store two hash IDs: the previous commit, and the commit that was merged.

The hash IDs in the metadata, which Git calls the parent commit(s) of the commit in question, form the commits themselves into a DAG. In the case of a simple chain of commits—the most common thing—we'll draw this DAG-fragment ("DAGlet") like this:

... <-F <-G <-H

where H is the hash ID of the last commit in the chain. Then, being lazy, we'll get sloppy about our arrows, which lets us draw multiple DAGlets that branch and merge:

          I--J
         /    \
...--G--H      M--N   <-- main
         \    /
          K--L   <-- feature2

for instance. The names at the right, which automatically and always point to the last commit in the chain, are our branch names. The lettered nodes in the graph above are our commits, which store files permanently.

Git shows you changes by comparing the stored files. Pick any two commits. For instance, pick a parent/child pair, like G-H or H-I or M-N or whatever. Each of those commits has a full snapshot of every file. Perhaps the snapshot in H has one file that's different from that in G, and one file that isn't in G at all. Then the comparison of G vs H will show one changed file and one added file.

Note that to compare a commit against its parent (singular), we have to have just one parent. That's great for all the commits above, except for merge commit M. It has two parents. If you ask Git to show you what changed in M, should it compare J-vs-M, or L-vs-M?

It might be nice if it would do both. In fact, some Git commands do do both, but then they get a little squirrelly about that. The git log command, however, by default just doesn't bother to compare against either one. This is going to be a problem in a moment.

Meanwhile, there's one more thing to note about the files stored inside commits. They're stored not as files, but rather as special, read-only, Git-only, compressed and de-duplicated entities (Git calls these blob objects internally though you don't normally need to care about the details). Your own programs can't actually use these, so in order to make a commit useful, Git has to extract that commit, into a working area.

Hence, all the files that you see and work with when you work with a Git repository are not in the repository after all. They are in your working tree or work-tree. These are not in Git. They were at most extracted from Git. A future git commit won't use these files either: Git builds new commits from what Git calls, variously, the index, or the staging area, or—rarely these days—the cache.

When you pick some particular commit—by checking out a branch, by using git checkout master for instance—Git works by extracting that commit's files. Git uses the branch name, which holds the commit's hash ID, to find the commit. The original copies of the file, as seen in the commit, go into Git's index (where they're still de-duplicated so that they take virtually no space in the index) and into your working tree (where they're expanded back into usable files, which do take space).

We then work on / with our files—the ones that aren't in Git—because these are the useful files. When we're done working on / with them, we must run git add on at least some of them. We can run git add on all of them, en-masse all at once, to be lazy and let the computer do the work, as long as we're careful to make sure that Git won't auto-en-masse add untracked files that we don't want to have in the next commit. Or, we can run git add only on the ones we've changed. What this does is to tell Git: make the index / staging-area copy match my working tree copy, for each file we actually add. Git will now compress them down, de-duplicate them by checking against every existing file stored anywhere in the repository, and update the index / staging-area to refer to the correct file contents, ready to go into the next commit.

This means that the index / staging-area acts as a storage space for your proposed next commit. It always has all the files in it, it's just that most of the time, most of those files—or even all of them—match the files in the current commit.

When we make a new commit, Git simply packages up all the files that are in its index at that time, adds the appropriate metadata—including the hash ID of the current commit, as found through the branch name we picked earlier when we ran git checkout—and writes all of this stuff out to make a new commit. The new commit gets a new, random-looking hash ID that is guaranteed1 to be different from all existing hash IDs. The new commit object goes into the database of all objects, indexed by hash IDs. And then Git stores the new hash ID into the branch name, so that the name picks out the latest commit.

With the invariant restored—that the current branch name holds the current hash ID and that we can find all earlier commits, one at a time, by following the parent links—Git is ready for more work. Note that the commit is made from whatever is in Git's index. The files in your working tree are irrelevant.


1What pigeonhole principle? Collisions never happen!


What you're seeing

Let's start with the git branch -vv output:

$ git branch -vv
  dataprocessing dcaa9f9 Merge pull request #122 from XYZaiXYZ/toyota
  master         dcaa9f9 [origin/master] Merge pull request #122 from XYZaiXYZ/toyota
* toyota         dcaa9f9 [origin/toyota: ahead 1] Merge pull request #122 from XYZaiXYZ/toyota

There's a fair amount of information here. We have three branch names. All three names identify the same commit, whose hash ID starts with dcaa9f9 (actual hash IDs are longer but any unique initial abbreviation of at least 4 characters suffices, so dcaa9f9 is fine here, and we can probably get away with just dcaa).

We have two remote-tracking names: these are our Git repository's memory of some other Git repository's branch names. These are set as the upstream of the corresponding (local) branch name: master links to origin/master as master's upstream, and toyota links to origin/toyota as its upstream.

We can't see the hash IDs that are stored in the remote-tracking names here, but git branch -vv does do something special, which we see in the third line: ahead 1. This means we have one commit on our (local) branch, toyota, that's not on their toyota branch. The origin Git repository has a toyota branch too, but their toyota stores a hash ID that isn't dcaa9f9. I don't know what it is, but I do know, from the ahead 1 text, that dcaa9f9 has this commit as its parent, or perhaps as one of its parents, plural, if dcaa9f9 is a merge commit.

Last, we also get the subject line of each commit message, for each commit. Since we get the same commit three times, we get the same subject line each time. The subject line we get is Merge pull request #122 from .... This is the kind of (terrible, but at least standardized) message that GitHub will generate, for instance, when you use their web interface to perform a merge. So dcaa9f9 is almost certainly a merge commit, with two parent commits. Our origin/toyota, which represents our Git's memory of origin's toyota, points to one of the parents of this merge commit.

Hence, if we were to draw this, we might draw it as:

...--I--J   <-- origin/toyota
         \
          M   <-- dataprocessing, master, toyota (HEAD), origin/master
         /
...--K--L

with the letter M standing in for commit dcaa9f9. I don't know the hash IDs of any of the other commits (except that J's starts with ac7c), but we won't really need them here.

You also mention:

When in the branch:

$ git merge master
Already up to date.

This is, now, no surprise. The git merge command:

  • uses your current commit (M or dcaa9f9) as found through your current branch name (found via the special name HEAD, which is what it's doing in the drawing above);
  • takes, as an argument, something that locates another commit: here, master. It then finds the commit; and
  • then uses the commit graph we've drawn to find a merge base, i.e., a best shared common ancestor commit.

The commit you ask to merge is dcaa9f9. That is the current commit. The best shared commit is therefore dcaa9f9 itself. That commit is the current commit, so no merge necessary or even possible. The merge command says Already up to date. and quits.

$ git diff origin master

[prints nothing]: this too is unsurprising, though we need to learn one new Git trick. The git diff command takes two commit specifiers.2 The two you gave are origin and master.

Now, origin is actually a remote, not a remote-tracking name. A remote, in Git, is a short name that stores a few things for easy access, and enables some other stuff. The main thing it stores, of interest to most people, is a URL. This is the URL you Git will use when your Git runs git fetch (or git pull, which runs git fetch). The "other stuff" it enables is the remote-tracking names, such as origin/master and origin/toyota.

The gitrevisions documentation describes a six-step process for turning a name like master or origin/master into a hash ID. Follow the documentation link, scroll down a bit if needed, and read through the six numbered steps. I won't quote them all here, but have a particular look at the last one: step six of six. It talks about looking for refs/remotes/name/HEAD. This will exist in your repository, and it will almost certainly be what Git calls a symbolic ref to origin/master.3

What all this adds up to, in the end, is that you're asking git diff to resolve origin/master to a hash ID—which it does, and gets dcaa9f9—and then to resolve master to a hash ID: dcaa9f9 again. Git then dutifully compares the snapshot in dcaa9f9 to the snapshot in dcaa9f9. Naturally, every file matches.

Last, in this section anyway:

$ git log README.md 
commit ac7cXXXX (origin/toyota)
Author: Mona Jalal <mona@XYZ>
Date:   Fri Feb 5 22:40:32 2021 +0000

    fixed two typos in the README.md

Here, you may be running into a "feature" (often a mis-feature) of git log.

When you run git log, it works by:

  • Starting from some commit or commits that you pick: if you don't pick one or more starting commits, it starts from the current commit (via HEAD as usual).

    The git log code places these commit hash IDs into a priority queue. This is because it can only handle one commit at a time. However, when using HEAD, which only selects one commit, there's just one entry in the queue in the first place.

  • Walking the commit graph, one step at a time. This part can get quite tricky.

The commit graph walk makes use of the priority queue as follows:

  1. Take the front entry off the queue. (If the queue is empty we're done: quit.)
  2. Decide whether to print anything about this commit. If so, print stuff about it.
  3. Decide whether to visit this commit's parent or parents. If this is an ordinary single-parent commit, we'll visit the (single) parent (except under --no-walk of course). If this is a merge commit, though, choose which parent(s) to visit based on any history simplification that is in effect.
  4. Push any to-be-visited parent commits onto the priority queue, in priority order. (Omit any already-visited parent.)

The tricky part here is in step 3: deciding which parent(s) of a merge commit are to be visited. The tricky part here is also in step 2: deciding whether to print anything about this commit.

We first visit commit M, because that's the one commit in the queue:

  • Since M is a merge commit, git log is lazy and doesn't, initially at least, try to compare it to any of its parents. It just decides not to print commit M, because—after not checking—file README.md seems unchanged, because Git was too lazy to check. So even if M does have a change to README.md when compared to J or L, it's not printed here.

  • Since M is a merge, we check for history simplification. This is turned on! It's turned on because we have a pathspec: README.md. So now we check whether M is what git log calls "TREESAME" to any parent, after stripping the trees down based on the supplied pathspec(s). So now we actually do check whether M's parents, J and L, have the same README.md as M.

    If one of these two parents does have the same README.md, that's the one that this particular git log will follow. Apparently commit J (ac7c...) has the same README.md file as commit M. Commit J is the one that origin/toyota identifies, as we see that right after the commit's hash ID, in parentheses. (This is from the --decorate option, which defaults to "on" in modern Git.)

So, since commit J has the same README.md, git log visits M, doesn't print it, and puts commit J in the queue to walk next, but doesn't put commit L into the queue at all. This is what Git calls History Simplification in action.

Git now visits commit J, as it's the only commit in the queue. Commit J has, as its single parent, commit I—so git log does bother to compare I vs J, specifically to see if README.md changed between this pair of commits. It did, so git log does print commit L. That's how we know (a) that the merge chose J in its history simplification process, and (b) that commit J's hash ID starts with ac7c—which you left in your quote.

Since J has I as its parent, that's the commit that goes into the queue. As the queue was empty, it now has just the one commit in it, and git log goes on to look at commit I. This will repeat until git log runs out of commits, or you get tired of reading its output.


2The git diff command is kind of fancy, so it can take none, one, two, or in some cases even more commit specifiers. It can also take pathnames and other arguments. This particular form of git diff takes two commit specifiers, though.

3The value stored in origin/HEAD is normally set up by git clone when you do a clone. You can change it using git remote, with its set-head sub-command. The initial setting made by git clone depends on what the Git repository you're cloning has set up as its HEAD. With GitHub, that's usually either master or, since the recent switchover, main, though anyone who's the administrator of some GitHub repository can set whatever they like.


Summary so far

  • Git is about commits. Always look for the commit hash IDs, as they're what Git really cares about. If two hash IDs match, that's the same commit.
  • Commits store snapshots. What you see and work with are files from the snapshots, at best.
  • Git works with the commit graph. Use git log --graph to see it with Git (it's often good to use --oneline and --decorate: remember "DOG", Decorate Oneline Graph, here; modern Git has decorate on by default). Consider using a graphical viewer, if you find those helpful. Be aware that some graphical viewers are better than others. See also Pretty git branch graphs.
  • The git log command lies. This is deliberate, and is mostly, usually, a good thing. The only history in a Git repository is the commits in the repository. We often would like to see a "file history". This doesn't exist—but git log can fake one, by selectively lying to us. But if we're trying to figure out why some change got lost, this selective lying gets in the way. (This isn't your actual problem, but it's worth remembering.)

Your actual problem

I also did git clone the repo in a test dir and I can see [the correct README.md in the new clone]

This means that the commit you've checked out, in that new clone, has the correct contents in the file. Git copied the committed file to Git's index, and then on to your working tree. The working tree copy in the new clone shows you what's in the index copy, which is from the committed copy.

If your existing working tree copy doesn't match, that just means that ... your working tree copy doesn't match. That's all. Your working tree copy is yours. You can do whatever you like it with it. You can print it out, crumple the printout into a ball, set fire to it, etc. You can remove the file, or encrypt it. Nothing you do do the working tree copy will affect Git's copies: those are safely stored inside commits, read-only, forever unchanging.

You can make new commits that have whatever you like in their README.md files, or that even don't have a README.md file, by changing your working tree copy and running git add README.md. This makes Git make its index copy match your working tree copy, and now a future git commit will save this version of the file.

Or, if you just want your working tree copy wiped out and replaced with a copy extracted from either an existing Git commit, or from Git's index as it appears right now, you can do that too. There is more than one way to do this. The best way in the most modern versions of Git (2.23 or later) is to use the new git restore command.

The git restore command is one of two commands that the Git folks used to break up the git checkout command. The problem is that git checkout is too powerful. It does too many different things. So they split it into git switch, which does about half of the things, and git restore, which does the other half.

To restore a working tree file from the HEAD-commit copy, you would use:

git restore --source HEAD --staged --worktree -- README.md

for instance. (This is the fully spelled out version; shorthand is allowed, but I'll skip it here as this answer is already quite long).

If you don't have this version of Git (2.23 or later), you can achieve the above with:

git checkout HEAD -- README.md

This does in fact still work in Git 2.23 or later, so you can use this form (which is already shorthand) in the most modern Git versions, too.

Note that these wipe out the version of README.md you have in your working tree. Git will not be able to get back any version that was not already committed. To get back the version from some historical commit—rather than from the current or HEAD commit—just replace the source part, HEAD, with the raw hash ID of that commit, or with any of the spellings that will let Git find that hash ID: see the gitrevisions documentation again.

(The reason git checkout got split up is that the git switch set of operations are the ones that are "safe": Git will check whether you're destroying unsaved work, and tell you so, unless you force the operation with --force. The git restore set are "unsafe": they assume you know that you're telling Git wipe out my work, and just do it. Putting both under one front end, git checkout, is a recipe for disaster: people learn that git checkout is safe... and it is, until it isn't.)

torek
  • 448,244
  • 59
  • 642
  • 775
-1

@mona-jalal, I really appreciate giving all the details. As you are able to see the README contents in the new clone, at least, we know that commit is present in the repo and it is intact. Somehow your local copy got criss-crossed. I know it is tough to move the training data around. There are few things you could try but it can get more complicated than using the newly cloned copy by moving your training data to new directory.

All the best!

Mohana Rao
  • 881
  • 5
  • 19