1

I try in a Azure DevOps pipeline to zip and deploy files, listed via Git. The problem I encounter is that I use the following command to list the files to deploy even for a previous commit because the deliveries are not always done at time:

git diff --name-only --diff-filter=d

but I found that a file has been listed for a folder that does not exists anymore and with git log, I didn't find a trace of the file neither.

Strangely, the file is appearing in strange way in the output of the command with double quotes mentioned as "Modified" :

diff output

PS : sorry the missing info is that the line with double quote ... \303\252 is the line causing my problems.I produced the output with git diff --name-status --diff-filter=d command, in fact \303\252 are a issue from the encoding conversion between ansi 1252 and UTF8 (Suivi_Entête_cr_old.asp).

I don't understand that git log does not give me answers and the fact the file is a modified one but not existing. I suspect that the folder has been deleted but when? I don't find the way to have a correct and detailed answer of the problem.

PS2: with the command git -c core.quotepath=false --no-pager diff --name-only : Appli_web_livre/credits_invendus/Suivi_Ent??te_cr_old.asp

PS3 : if the console powershell use ASCII encoding, i have the ?? replacing the ê , but changing to UTF8 the console:

[Console]::OutputEncoding = [System.Text.Encoding]::UTF8)

i obtain the right string.

Duplicate with The output of git diff is not handled correctly in powershell

David Bru
  • 11
  • 2
  • For the same commit, can you compare the output of : `git diff --name-status` and `git diff --name-status --diff-filter=d` ? – LeGEC Sep 21 '20 at 12:19
  • two extra notes : 1. it is preferrable to write the output of commands as text instead of a screen capture ; 2. can you describe the command you used to produce said output ? `--name-only` would not output a status letter on the left column. – LeGEC Sep 21 '20 at 12:21
  • The double quotes are not all that important, relevant, they are there because the path contains escaped characters. I am wondering if those could be the reason why you're finding its containing directory in an unexpected state. How are you looking for the directory/files when you mention they are not to be seen? – Ondrej K. Sep 21 '20 at 12:38
  • Uh... gonna disagree with Ondrej on that one. The double-quotes are super important because they're telling you that whole string is a filename in the root directory, not the path they otherwise appear to be. – Mark Adelsberger Sep 21 '20 at 14:41
  • First of all, you should check all of the above files which are stored in git or not? You use `git ls-tree -r master --name-only`, you replace `master` by `your branch`. – Thân LƯƠNG Đình Sep 21 '20 at 14:54
  • @MarkAdelsberger I am perplex but open to the notion I am wrong. The double quoted path is quoted because its rooted elsewhere or has different meaning relative to the not quoted paths around it? Those are not in their entirety filenames or are rooted at a different place? Where? However, I've double checked... quoted (just like unquoted) paths appear to still be rooted at (repo root for `git diff --name-only`, or cwd for `git status`), but quoting is used for names w/ escaped characters (but not for, even trailing, space; that's actually a bit evil)). Is this behavior new/specific to 2.28? – Ondrej K. Sep 21 '20 at 16:55
  • The path is double quoted because it contains characters that Git thought should be quoted, namely the `\303\252` part. This is controlled by the `core.quotePath` setting. The Git code that quotes names is not self-consistent; Junio Hamano proposed a fix recently on the Git mailing list. – torek Sep 21 '20 at 22:30

1 Answers1

0

I don't understand that git log does not give me answers and the fact the file is a modified one but not existing. I suspect that the folder has been deleted but when?

In an important sense, Git doesn't actually store folders. It stores files that have long names that your OS demands be chopped up, such as Appli_web_livre/credits_invendus/CR_ajout_article.asp. That's all one file's name: it's not a folder holding a folder holding a file, it's just a name. It's your OS that demands that Git create a folder Appli_web_livre first, and so on.

The way Git deals with this is that, whenever it needs to create a work-tree file named (say) Appli_web_livre/credits_invendus/CR_ajout_article.asp, it will create a folder credits_invendus within a folder Appli_web_livre, so that the OS will be happy. This does lead to problems sometimes, such as when you have a file named Appli_web_livre in the way. (The internal code in Git, where it's building up OS operations to perform for a git checkout or git reset or similar, calls this a "D/F conflict": directory where file is needed or vice versa.)

This matters for various reasons, including:

I don't find the way to have a correct and detailed answer of the problem.

The git diff command you mention:

git diff --name-only --diff-filter=d

generally compares two commits. (There are a lot of exceptions to this but since you mentioned an AzureOps pipeline, I'm guessing here that you are using this mode.) To make this happen, you must specify both commits:

git diff <commit1> <commit2>

The commit named at the left, commit1 here, becomes the left side of the difference operation, and the commit named at the right—commit2—becomes the right side.1 Each commit holds a full snapshot of every file that Git knew about when you made the commit (or when whoever made the commit, made it). If there are files with the same names on the left and right, those are the same file. If there's a file Appli_web_livre/credits_invendus/CR_ajout_article.asp on the left, but not on the right, then the file was Deleted;2 and if there is one on the right but not the left, then the file is newly-created (Added). Otherwise, if the two copies don't match, the file is Modified. There are a few other possible letter-codes (see footnote 2 for Renamed, and Type-changed shows up when one side has a symbolic link and the other side has a file, for instance) but A, D, and M are the three main ones of interest here.

Normally, for any file that's exactly the same, Git says nothing at all, and for any file that's changed in any way—including being newly added, or deleted, or just modified—Git will print the file's name and then show a set of changes that, if applied, will produce the right-side file from the left-side file. The --name-only flag tells git diff to produce only the names of added, deleted, or modified files.

The --diff-filter option tells git diff that, for some changes, it should suppress everything, and for others, it should print whatever is left to print: the name and the status letter, and without --name-only, the change as well. The letters you put after the = sign are the ones to be printed, or to be suppressed. Uppercase letters tell Git which ones to print:

--diff-filter=AD

for instance would print any Added and Deleted files, but omit modified files or those with any other letter code. Lowercase, on the other hand, means exclude this type, so:

--diff-filter=d

means exclude D files, i.e., print Added, Mmodified, Renamed, or Type-changed files, but not Deleted files.3 As the status here is M, that part is fine.

What this means is that the two input commits both contain a file whose name is Appli_web_livre/credits_invendus/CR_ajout_article.asp and the two copies of those files differ. Without knowing what the input commits are, there is little more we can say about this.


1The -R flag reverses the sides. This isn't required with this form of diff since you could just reverse the two arguments, but some git diff operations use your work-tree contents, or the index contents, or an implied HEAD, and for these, the -R flag is particularly convenient, or even necessary.

2When the --find-renames option is enabled, a deleted file on the left can be paired with a newly-created file on the right. This kind of pairing-up then tells git diff to declare that these files, despite the name change, are the same file, in the same way that a copy of the Ship of Theseus might actually be the Ship of Theseus. In this case, though, that did not happen.

3The full set of possible letters is listed as ABCDMRTUX. Status code B is only possible when using the -B option and indicates a broken pairing, but I've never been able to get it to come out in any test. Status code U can only occur when you're in the middle of an incomplete mere. Status code C can only occur when you have enabled --find-copies. Status code X indicates a bug in the diff code and therefore should never occur. In practice, your git diff should produce only A, C, D, M, R, or T; and C and R should only occur if you have copy and/or rename detection enabled. Since git diff obeys user configurations, this will depend on your user configuration.

Because git diff obeys user configuration, scripts should rarely use git diff itself, but rather should invoke one of the more specific plumbing commands, such as git diff-tree.


Additional specifics

I found that a file has been listed for a folder that does not exists anymore and with git log, I didn't find a trace of the file neither.

You don't show what command you used with git log, but if you ran:

git log Appli_web_livre/credits_invendus/CR_ajout_article.asp

for instance, this turns on what git log calls history simplification. The goal of history simplification is to eliminate from view any commit that doesn't affect what you see now, in the commit you start with. So if the file is gone now, you often won't see any of the commits that do have the file. For more detail, see my answers here and here. If the files were accidentally deleted via merge, this is not sufficient; see this answer as well.

Strangely, the file is appearing in strange way in the output of the command with double quotes ...

Files are double-quoted when at least one character in the file name requires quoting, or—sometimes but not always—when there is a space in the file's name. The git config documentation describes this under the core.quotePath variable:

Commands that output paths (e.g. ls-files, diff), will quote "unusual" characters in the pathname by enclosing the pathname in double-quotes and escaping those characters with backslashes in the same way C escapes control characters (e.g. \t for TAB, \n for LF, \\ for backslash) or bytes with values larger than 0x80 (e.g. octal \302\265 for "micro" in UTF-8). If this variable is set to false, bytes higher than 0x80 are not considered "unusual" any more. Double-quotes, backslash and control characters are always escaped regardless of the setting of this variable. A simple space character is not considered "unusual". Many commands can output pathnames completely verbatim using the -z option. The default value is true.

This particular file name has \303\252 in it, and by default, that gets the string quoted.

(Whitespace quoting is inconsistent between git status and some other commands, and there is some code improvement coming in a future Git release to fix this.)

torek
  • 448,244
  • 59
  • 642
  • 775