git add all files with nonempty diff

Question

Is there a git command to add all modified files for which the output of git diff <file> is not empty?*

(*Modified but empty diff can happen by using tools that strip out certain parts of the files automatically)

torek · Accepted Answer · 2018-05-29T00:11:38.387

TL;DR

Consider just using git add -u. Your other main option is to write your own program, but there's really little point.

Long

Obsidian's answer is slightly wrong, in a not-so-important way, but there's a piece missing from the original question (at least at this point—the question might get edited to fix this) that could mislead:

(*Modified but empty diff can happen by using tools that strip out certain parts of the files before)

... before ... what? :-)

I think I know what you meant to say here: you can have a clean filter that edits files while Git is copying the files' contents from the work-tree into the index / staging-area.

Remember that in Git, there are in effect three active copies of a file while you work (or more if you're in the middle of a conflicted merge, although then it depends on how one counts). For each file:

There is the HEAD version, which is read-only. You can git show HEAD:path to view it. It's actually stored in a special, Git-only, compressed format; git show has to expand it out.
There is the index / staging-area version. This is initially just a copy of the HEAD version of the file. Like the HEAD version, it's in a special, Git-only, compressed format; but unlike the HEAD version, you can overwrite this file.
Last, there's a version that Git itself doesn't care about, but you probably do: that's the version in your work-tree. This is in its ordinary format that your computer can deal with, so it's read/write and you can do anything you want to it. Git won't care a bit; Git is concerned mainly with the index version.

Because the index and work-tree versions are both writable, you can change one or both of them. Normally, you change the work-tree version, then tell Git to copy that work-tree version back into the index. That's what git add does: copy from the work-tree, into the index. If you have a clean filter defined, and/or if you set up line-ending changes, git add copies the file into the index while applying the filters.

The commands (plural) that copy the file from the index to the work-tree can apply smudge filters and/or do line-ending changes as well. The usual one that we work with every day is git checkout, which copies from the index to the work-tree, or copies from a commit to both the index and the work-tree, depending on how you invoke git checkout.

Because smudge and clean filters exist, git diff has to do some special magic when comparing the index version of a file to the work-tree version of that same file. The choice Git makes is to run the clean filter on the work-tree copy. See my answer to a related question for details. There's some optimization that happens here as well: Git tries to know whether the cleaned work-tree file matches the index version, without having to run the clean filter over the file. If you change your filters—this includes changing end-of-line settings—Git may become confused about the cleanliness of each file. The simplest solution to this problem is to use this two step process:

Remove .git/index: this removes all the staged files.
Run git reset (with no additional options): this re-creates the index from the HEAD commit.

Of course, that also wipes out any carefully-staged files you had. An alternative is to update the modification time stamp on every file in the work-tree (e.g., find . -name .git -prune -o -print0 | xargs -0 touch, but that also has annoying side effects. Git needs a command (or flag to git reset) to invalidate and recompute all the cache data, without erasing the staged files. It doesn't have one, so we are all stuck.

This brings us back to the original question:

Is there a git command to add all modified files for which the output of git diff <file> is not empty?

Yes, sort of: it's git add, specifically git add -u, which is documented this way:

If no <pathspec> is given when -u option is used, all tracked files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).

I say "sort of" because this also removes files that are missing in the work-tree but are present in the index. It also, arguably at least, re-adds files that don't show any difference between the index copy and the work-tree copy.

When git add has finished adding the cleaned file, Git updates its cache information in the index, so that it now knows that the file is clean. If this adds to the index a copy of the file that matches the copy that was already in the index, well, so what? The file in the index is unchanged, except that now the cached time-stamp information in the index is correct. You spend a bit of compute time doing this, but that's probably better than spending your own personal time.

Obsidian · Answer 2 · 2018-05-28T21:08:40.320

(*Modified but empty diff can happen by using tools that strip out certain parts of the files before)

No, it can't. If you remove parts of a tracked file, this will appear as a change block with line starting with « - » sign. Also, git works with snapshots and always store the full file content or not at all, and use a SHA1 sum to name and certify it. So, if you get an empty diff, the file is forcibly unchanged, unless I got something wrong.

As regards the rest of you question, you can still use

git add -u

… to automatically add updated files that were already tracked.

EDIT : when using git diff <filename> with a single file name, your file is compared with the index. In other words, once you've added the modified file, it goes away from regular diff, and switch from red to green in the git status result list.

It's handy to check out what's still to be added or not. You can see what you have already added with

git diff --cached

(or --staged, which is a synonym). If you want to « unadd » a file, use

git reset <filename>

… to bring back the index to the state of the commit pointed by HEAD without changing the working tree (that is to say, don't use --hard here).

Then you need to explain to me what leads to a file being listed under "modified" when calling `git status` and `git diff ` outputting nothing. Also, look at https://github.com/kynan/nbstripout if you know how to work with Jupyter notebooks. This is the behavior I get from enabling this tool in a project. I want to add all changes that are not stripped out. There is often no difference to the last committed version after removing the output. — clstaudt, May 28 '18 at 20:44

git add all files with nonempty diff

2 Answers2

TL;DR

Long