This is not an answer to the original question, but instead an answer to one of the comments on that original question
git tags commits. Not files. Can you tell us what you want to do with
these "tags"? I think a more appropriate answer would be possible
then. – Noufal Ibrahim
I am posting this extended comment as a pseudo-answer
- because stackoverflow's comments have limited formatting.
- and because I don't feel it is appropriate for me to add my explanation of why I want single file or subset tagging to the OP question, in case it is not the same
BRIEF ANSWER:
When I do a diff between tag-for-code.txt--update-by-jim-April1st and
tag-for-code.txt--original-version-by-joe
I only want to see diffs for my-lib/import/new-module/code.txt.
Or perhaps my-lib/import/new-module.
I don't want to see diffs for my-lib/import/module1,
which is supposedly completely independent[*] of
*my-lib/import/new-module/code.txt.
No, I don't want to have to know what parts I should filter out. I may not know that without digging deeper.
I understand that git tags are for commits, and commits are essentially snapshots of the entire repository. So I'm just asking for a workaround that would allow me to have the convenience of saying ... diff tag1 tag2
refer only to the subset of files that are explicitly identified as belonging to tag1 and tag2, and not all a whole slew of other independent files that happened to have been changed between.
E.g. perhaps I should have a hypothetical subset tag create a file containing a list of filenames and the repo commit ID. So then tools that use such subset tags would filter only information relevant to the list of filenames in their respective tag-files. Or perhaps just blob-IDs. Whatever.
Has anyone got a BKM for doing this?
===
I am posting this as a pseudo-answer because I have long suffered intellectual pain because of this lack of
single file or subset tagging in git. Surely there is a git
equivalent? Although I suspect that there is not because of the
amazing number of off target responses, often of the form "why do you
want to do that?"
Some other version control systems have both entire repo and single
file or subset tagging. AFAICT git does not. Without loss of
generality I will provide an example from CVS, although these concepts
apply to DVCS just as well.
Brief summary of why you want single file and subset group of file tagging as well as whole repo tagging
We all agree that symbolic tag games are good. Right?
In the bad old days there were only single file tags.
Although usually you can apply the same tag to groups of files, it was not guaranteed that you apply the same tag to the entire repository.
So if you did checkout -rTestsRunTag
expecting to be able to build and run the tests successfully
it might fail, because some file that you did not know you depended on was not tagged with TestsRunTag.
Hence the preference for whole repository tags.
Tags that apply to a snapshot of the whole repository.
Hopefully, if you check out such a whole repository tag, you're guaranteed to be able to build successfully. Right? ...
Actually not right. Did you put your build tools, compiler, etc. in your repository?
Nevertheless, whole repository tagging was a very good step towards reproducible builds and tests.
nevertheless^2, VCSes that's apply only whole repository tagging throw the baby out with the bathwater.
There is still a need for single file tagging. More often, group of
file tagging where the group is not the entire repo. Typically a
directory subtree, or if you related such things.
Basically, you need such subset tagging when the tag is relevant only to a subset. When the tag is irrelevant and even confusing to the entire repo.
Or when using a whole repo tag is inconvenient.
As I try to explain in the example below
when I do a diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe
I only want to see diffs for my-lib/import/new-module/code.txt.
Or perhaps my-lib/import/new-module.
but I don't want to see diffs for my-lib/import/module1, which is supposedly completely independent of *my-lib/import/new-module/code.txt.
Brief(?) example of wanting non-full repo tagging
Here's a brief example, that has just prompted me to search for and post this.
I have a library of mostly independent things. Call this my-lib
I have collected many modules from around the web that I put in this library,
in places like my-lib/import/module1, my-lib/import/module2.
Keeping them separate from each other
and from my own stuff like my-lib/my-stuff-my-module1, and so on.
I am adding a new module to this library, that I have imported off some website.
Let's call that my-lib/import/new-module
Unfortunately that module does not have its own version control system.
It was posted on a discussion thread, with slightly different versions by different users.
I'm not quite sure which version I want use so I'm going to put a few of them in my library
WLOG let me just talk about a single file in the module
my-lib/import/new-module/code.txt
So I download first version I find on the discussion thread.
Place it into my-lib/import/new-module/code.txt.
Check it in.
I would like to give it a symbolic name,
since that is nicer than using either git hashes
or numeric version numbers like CVS's 1.1.1.1.
How about perhaps symbolic "tag" name code.txt--original-version-by-joe
although I'm just as likely to put some dates like original creation date in the tag name
as well as an the file commit message, and ideally more comments associated with a description of the tag.
and I don't need to have the filename in the tag, it's just an example.
Perhaps I use this for a while. But eventually I see a different version on the discussion thread.
Next I download first version I find on the discussion thread.
Place it into my-lib/import/new-module/code.txt.
Check it in.
I would like to give it a different symbolic name.
How about code.txt--update-by-jim-April1st.
I hope that this is sufficient to show why I want tags that apply to single files, or subsets of files, and not to the entire repository.
The tags code.txt--original-version-by-joe and code.txt--update-by-jim-April1st are relevant only to
the module my-lib/import/new-module
and its file my-lib/import/new-module/code.txt.
These new-module tags are irrelevant to other modules of which it is completely independent, such as
my-lib/import/module1, my-lib/import/module2, my-lib/my-stuff-my-module1.
When I do a diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe
I only want to see diffs for my-lib/import/new-module/code.txt.
Or perhaps my-lib/import/new-module.
but I don't want to see diffs for my-lib/import/module1, which is supposedly completely independent[*] of *my-lib/import/new-module/code.txt.
Note: There is no such thing as completely independent, but...
Note that I said "supposedly completely independent".
That's the gotcha.
Even completely independent library modules
may break the library infrastructure for crosscutting stuff like makefiles,
even if they are never linked into the same program.
But nevertheless,
it is very convenient to have the default
diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe
only show diffs for my-lib/import/new-module/code.txt.
and not for Supposedly completely independent my-lib/import/module1.
It is also convenient have ways of saying
"diff against the snapshot of the entire repository at the time that the single file tag
code.txt--update-by-jim-April1st
was created.
That's unambiguous if there's only a single file tag.
Minor issues arise if that same tag is applied to versions of multiple files
That do not reside in a single entire repository snapshot (e.g. git commit).
but you can deal with that.
Why not use modules?
I can hear some idiots, ahem, people of less imagination say
"why not just use modules?"
Talking not about the generic programming division of the program into modules and submodules, but modules and submodules specifically in the version control system,
like git modules and submodules.
OK, I just used modules in the above example to simplify the discussion.
Modules as defined by the version control system have some overhead.
You have to set things up a priori.
That is frequently not good enough.
Many systems start off as a single file in some big parent repository
when they are not considered to be a separate module.
and then evolve into being a separate moduke.
Heck, they often start as a single function in a big_file_of_many_functions.c
You then realize that this function and/or some near relatives should be in a file by themselves,
like foo.c, and if you're in C/C++ nearly always need a header file foo.h.
eventually you realize that it would be really better if there is a separate directory foo/
containing foo/foo.c and foo/foo.h.
And eventually you may add foo/Makefile, foo/tests/test1.py
and somewhere along this evolution from function within a bigger file of other functions
to a subtree
you decide to give it a version control system modular submodule.
That's great.
But it sure would be nice to have ways of referring
(a) to this set of related things within the bigger repository
(b) and two versions of this set of related things.
I say again:
Modules as defined by the version control system have some overhead.
Considerable overhead.
It is not just me saying this.
This is very much related to the debate about mono repo versus multi-repo,
e.g. https://johnclarke73.medium.com/mono-or-multi-repo-6c3674142dfc.
Some very important software developers and companies use mino-repos.
IMHO in part because git and other VCS module systems are a hassle.
Consider a tree where every subdirectory tree is a separate module
I have long maintained a personal library.
It is a directory tree, deeply nested.
Nearly every subdirectory tree can be treated as an independent module.
Often individual files can be exported individually, e.g header-only librares in C or C++.
usually I prefer to have a directory for each minimal logical subsystem,
so that you can have separate makefiles and test scripts.
I often share this code with other projects, companies, employers...
These other projects seldom want to import my entire personal library tree.
Hence I want to be able to checkout Just a subset,
ranging from a single file,
most often a subdirectory tree
but sometimes a set of subdirectory trees that are required to work together.
These other projects often do not want to use my version control system. Too bad, that's life.
But sometimes they do.
Of course, modern DVCS's have very poor support for checking out arbitrary subsets.
Short of module support (see elsewhere).
It is not good when a sparse checkout, sometimes called a sparse branch
(although terminology quickly gets into the weeds of particular version control systems)
carries history related to items that are not part of what was checked out.
or if not that, if that sparse branch that is not carrying excess history and access objects
cannot be merged back into a repository that has more objects in different parts the file system that were not checked out, and for history.
When they are willing to share version control systems, it is really confusing
to have the items that they have checked out be tagged with tags that are absolutely irrelevant to them.
Sometimes not just confusing.
Sometimes a security hole.
Anyway, in this great big personal monorepo
I like to tag the subsets checked out by other projects,
and imported back from those other projects.
But, again, those tags are irrelevant to the stuff that those other projects don't want to look at.
e.g. CVS tags = per-file but easy to do multiple file
CVS tags actually are defined on a per file basis. But it is common,
and CVS makes it easy, to apply the same tag through multiple files,
to a subtree of the repo, or to the entire repo. e.g. if you just
say cvs tag tagname
in a particular directory, CVS applies the tag
to all files in that directory and its subdirectories. If you say cvs tag tagname
at the top of your CVS repository, the tag is applied to
your entire repository. Many CVS users have an alias or command that
allows you to tag the entire repository even if you are in a
subdirectory. but you can also specify a single file cvs tag single-file
, or multiple independent files and subdirectories cvs tag foo/file1-only bar/file2-only bazz-tree
Of course the CVS tags are not guaranteed to be consistent across the repository the way git tags are.
But it's a usage model question.
Sometimes you want guaranteed cross repository consistent tags.
Sometimes you don't.
Often one uses naming conventions to differentiate the two.
And tools to determine if those naming conventions are correctly applied, e.g. if for example a supposedly whole repository tag has not been applied to everything.