0

My Git repo contains files for multiple publications. Each publication has a top-level file that references various lower-level files (chapters, topics, image files). For example,

book1.ditamap       (.ditamap == top-level map file)
book1/topicA.dita   (.dita == lower-level topic file)
book1/topicB.dita
book1/figureC.png    (image for figure)
book2.ditamap
book2/topicD.dita
book2/topicE.dita
book2/figureF.png
shared/shared_topicG.dita
shared/shared_topicH.dita

I would like to compute the Git SHA for all files in a given publication (the top-level files and all dependency files). Unfortunately, it's not as simple as computing the SHA for a directory because publications reuse files from other publications, files are scattered across shared directories, and so on.

How can I do this?

chrispitude
  • 121
  • 2
  • 6

1 Answers1

0

This solution is an extension of Andy Pryke's excellent answer in "How to compute the git hash-object of a directory?". It applies to any types of files for which dependencies exist (source code, publications, assets for a game).

First, write a script in your language of choice that returns all files related to a specified top-level file, one file name per line. For example,

% get_referenced_files.pl book1.ditamap
book1.ditamap
book1/topicA.dita
book1/topicB.dita
book1/figureC.png
shared/shared_topicH.dita
%

Next, pipe this file list into the git hash-object --stdin-paths command, which returns the corresponding SHA for each filename read from STDIN:

% get_referenced_files.pl book1.ditamap | git hash-object --stdin-paths
6b7d90792b9b9f40d553b19808978a78ba4994e5
03e54ce5a8880abed1daacb519122148f6b08373
aef9d70aac4fc5bde8527ca48e6cb44e4cc7083c
e93f704c031883f829a0c5ce4e01d16983ce709a
8d844111208f31be33855310663d625bdf4f37f6
%

Finally, pass this SHA list into the git hash-object --stdin command, which computes the SHA for the literal string content read from STDIN (in this case, the individual per-file SHA values):

% get_referenced_files.pl book1.ditamap | git hash-object --stdin-paths | git hash-object --stdin
f30c715c5bc89846104d336b1ce7a5b95efa7fd8
%

This final SHA value represents a unique "fingerprint" for the specified top-level file and all its dependency files. If this fingerprint changes, then something somewhere in the set of files has changed.

In my perl script, I use the following one-liner to grab the SHA fingerprint for a particular publication:

my $sha = (`get_referenced_files.pl "$top_file" | git hash-object --stdin-paths | git hash-object --stdin` =~ s!\n$!!r);
chrispitude
  • 121
  • 2
  • 6