2

I'm trying to make a script that will read the output of git log and place this is an XML file.

Here is an example of the script.

#!/bin/bash
repo=(/srv/git/repositories)
list1=($repo/test.git)
cd "$list1"

echo '<?xml version="1.0" ?><rss version="2.0"><channel>' >> /tmp/test.xml
for i in $(git log --pretty=format:"%h")
do
   for e in $(git log | grep "Author:" | awk '{print $2}')
   do
      #for f in $(git log --pretty=format:"%cn")
      #do
         #for g in $(git log --pretty=format:"%cD")
         #do
         cat << EOF >> /tmp/test.xml
         <item><title>$i</title><description></description><author>$e</author><pubDate></pubDate></item>
         EOF
         #done
      #done
     done
done
echo '</channel></rss>' >> /tmp/test.xml

When I do this this, the result is that each commit number and Author will be read and echoed multiple times. So I will get an .xml file like this: Lots of the same commit number!

<rss version="2.0">
<channel>
   <item>
      <title>906feb6</title>
      <description/>
      <author>test</author>
      <pubDate/>
   </item>
   <item>
      <title>906feb6</title>
      <description/>
      <author>test</author>
      <pubDate/>
   </item>
   <item>
      <title>906feb6</title>
      <description/>
      <author>test</author>
      <pubDate/>
   </item>
   <item>
      <title>**906feb6**</title>
       <description/>
       <author>test1</author>
       <pubDate/>
    </item>
    <item>
       <title>**906feb6**</title>
       <description/>
       <author>test1</author>
       <pubDate/>
    <item>
    <title>**ffb521e**</title>
       <description/>
       <author>test1</author>
       <pubDate/>
    </item>
<channel></rss>

What I want is that each commit number has an author, a description, and a publication date. But it has to get its information from those commands.

I want a output like this, could someone help?

<item>
   <title>906feb6</title>
   <description/>test commit 1</description>
   <author>test1</author>
   <pubDate>Mar, 18<pubDate/>
<item>
   <title>**ffb521e**</title>
   <description>test commit 2</description>
   <author>test2</author>
   <pubDate>Mar, 18<pubDate/>
</item>
tripleee
  • 175,061
  • 34
  • 275
  • 318
darkM
  • 45
  • 5
  • Those first two assignments are array assignments that just happen to work out the way you want because accessing `$array` is the same as accessing `${array[0]}`. But if those aren't supposed to be arrays you should drop the `()` around the right-hand side and if they are supposed to be arrays then your script should use them as such. – Etan Reisner Mar 18 '15 at 16:36
  • 1
    You aren't limiting the second `git-log` command to the commit you are looping over. Also you [**should not** read lines of data using `for`](http://mywiki.wooledge.org/DontReadLinesWithFor). – Etan Reisner Mar 18 '15 at 16:38
  • Could you give me an example how to do it properly ? – darkM Mar 18 '15 at 16:44
  • See the link in the first paragraph on the page I linked to. Pay specific attention to the NUL/null discussion and see the `-z` argument to many git commands. – Etan Reisner Mar 18 '15 at 16:45
  • Ha ok, well did some reading off arrays. `readarray -t test2 <<<"$(git log --pretty=format:"%cn")"` `i=0; for item in "${test1[@]}"; do #printf '%s\n' ${test1[$i]} ${test2[$i]}; echo "" ${test1[$i]} bla ${test2[$i]}"" >> /tmp/test.xml; let "i=i+1" done ` Do not know if this is the best way..!!! – darkM Mar 19 '15 at 09:52
  • Possible duplicate of [How to Lex, Parse, and Serialize-to-XML Email Messages using Alex and Happy](http://stackoverflow.com/questions/17354442/how-to-lex-parse-and-serialize-to-xml-email-messages-using-alex-and-happy) – Paul Sweatte Jun 16 '16 at 18:18
  • The first `` in your example output seems to lack the closing `` tag. – tripleee Jun 17 '16 at 05:45

2 Answers2

1

As @EtanReisner pointed out, you are looping over all commits in the inner loop, not just the one that the outer for loop is handling.

Here's how to avoid the for loops, and fix that problem.

#!/bin/sh
echo '<?xml version="1.0" ?><rss version="2.0"><channel>' > /tmp/test.xml  # Make sure we start with an empty file
git log --pretty=format:"%h" |
while read -r i; do
   # Presumably you want a single commit here
   # See also https://stackoverflow.com/a/4082178/874188
   # Also avoid Useless Use of grep
   e=$(git log "$i" -1 | awk '/^Author:/{print $2}')
   cat <<____EOF >> /tmp/test.xml
     <item><title>$i</title><description></description><author>$e</author><pubDate></pubDate></item>
____EOF
done
echo '</channel></rss>' >> /tmp/test.xml

I see no reason to put the repo in two (sic!) arrays at the start. (If it's just a single value, why use arrays at all?) Just run this in whichever repo you want to process.

With that out of the way, there are no Bashisms in this script, so I changed the shebang to #!/bin/sh instead.

To get the description etc into the snippet as well, maybe something like this instead (wrapped for legibility; should be just one line):

git log "$i" -1 --format=format:"<item>%n <title>%h</title>
    %n <description>%s</description>%n <author>%an</author>
    %n <pubDate>%ad</pubDate>%n</item>"

(2021 update)

... but I'm guessing you actually want simply something like

#!/bin/sh
echo '<?xml version="1.0" ?><rss version="2.0"><channel>' > /tmp/test.xml
git log --format=format:"<item>%n <title>%h</title>%n <description>%s</description>%n <author>%an</author>%n <pubDate>%ad</pubDate>%n</item>" >>/tmp/test.xml
# TODO: escape XML specials
echo '</channel></rss>' >> /tmp/test.xml

Escaping XML characters when you already have (possibly malformed) XML fragments is kind of tricky, so maybe replace the git log --format with something simpler you can feed to Awk or Perl for further processing. For robustness, you migth want to output null-separated fields, but regular non-GNU Awk can't reliably handle those, so then maybe use Perl instead. Here's a slightly quick and dirty attempt:

git log --format=format:'title:%h%x00description:%s%x00author:%an%x00pubDate:%ad%x00' |
perl -0ne 'BEGIN {
    print("<?xml version="1.0" ?><rss version="2.0"><channel>\n");
  }
  @s = /^([^:]+):(.*)/;
  $f[$i] = $s[0]; $fld[$i] = $s[1];
  # Escape XML specials &<>
  $fld[$i] =~ s/&/\&amp;/g; $fld[$i] =~ s/</\&lt;/g; $fld[$i] =~ s/>/\&gt;/g;
  # Print when we have gathered a full record
  if ($i++ == 3) { print "<item>\n";
    for my $field (0..$i-1) {
      print("  <$f[$field]>$fld[$field]</$f[$field]>\n");
    } print "</item>\n";
  @f = @fld = (); $i = 0 }
END { print("</channel></rss>\n"); }' >/tmp/text.xml
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Getting just the second field out of the `Author:` header is an oversimplification, but I'm not attacking that here. Also, if the value from the Author: header contains anything which needs XML escaping, you need to take care of that, too. – tripleee Jun 17 '16 at 05:43
  • I might've done something wrong, but in my case `while read -r i` loop was missing the oldest commit, so I had to replace it with `for i in` loop and that got it. Nevertheless, +1 for the answer. – retif Oct 01 '21 at 19:17
  • 1
    @retif [Don't read lines with `for`.](https://mywiki.wooledge.org/DontReadLinesWithFor) I'm guessing your last line lacks a newline, though I can't imagine how you could end up with `git` doing that. Are you doing other manipulations which wreck the format? There's a hack for getting `read` to succeed without a final newline, something like `while read-r l || [ -n "$l" ]`; see also https://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line – tripleee Oct 02 '21 at 08:03
  • Right, I used bare `$i`, whereas I should've used it in quotes (`"$i"`). Having it quoted I get the oldest commit too. Thank you for pointing out to not reading lines with `for` (*and also for the updated variant without loop*). – retif Oct 02 '21 at 11:33
  • 1
    @retif I don't really see where you would have done that, but all's well that ends well. See also [When to wrap quotes around a shell variable](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) and the updated answer now with XML escaping. – tripleee Oct 03 '21 at 18:27
  • You are right again, quotes had nothing to do with it. It's the `[ -n "$l" ]` "hack" that made it work (*to get the last line, which is the oldest commit and which doesn't have a newline following it*). – retif Oct 04 '21 at 09:34
1

If it's an xml-file you want to end up with, then please use an xml-parser, like , to create valid XML:

$ git log --pretty=format:'%h%x09%s%x09%an%x09%ad' | \
  xidel -se '(
  <rss version="2.0"><channel>{
    for $x in x:lines($raw) let $a:=tokenize($x,"\x09","!") return
    <item>
      <title>{$a[1]}</title>
      <description>{$a[2]}</description>
      <author>{$a[3]}</author>
      <pubDate>{$a[4]}</pubDate>
    </item>
  }</channel></rss>
)' --output-node-format=xml --output-node-indent

Or with "computed constructors":

$ git log --pretty=format:'%h%x09%s%x09%an%x09%ad' | \
  xidel -se '
  element rss {
    attribute version {"2.0"},
    element channel {
      for $x in x:lines($raw) let $a:=tokenize($x,"\x09","!") return
      element item {
        element title {$a[1]},
        element description {$a[2]},
        element author {$a[3]},
        element pubDate {$a[4]}
      }
    }
  }
' --output-node-format=xml --output-node-indent

With https://github.com/benibela/xidel.git for example:

$ git log -3 --pretty=format:'%h%x09%s%x09%an%x09%ad'
c1c75e2 add --output-key-order option   benibela    Mon Aug 23 00:33:25 2021 +0200%
49de5bc use new JSON serializer, gh close #71   benibela    Mon Aug 23 00:24:31 2021 +0200%
82a403a hardcode type information for interpreted functions, so they do not need to be parsed at startup    benibela    Wed Aug 18 08:24:58 2021 +0200%

The latest 3 commits with a TAB-character between each item.

$ git log -3 --pretty=format:'%h%x09%s%x09%an%x09%ad' | \
  xidel -se '[...]' --output-node-format=xml --output-node-indent
<rss version="2.0">
  <channel>
    <item>
      <title>c1c75e2</title>
      <description>add --output-key-order option</description>
      <author>benibela</author>
      <pubDate>Mon Aug 23 00:33:25 2021 +0200</pubDate>
    </item>
    <item>
      <title>49de5bc</title>
      <description>use new JSON serializer, gh close #71</description>
      <author>benibela</author>
      <pubDate>Mon Aug 23 00:24:31 2021 +0200</pubDate>
    </item>
    <item>
      <title>82a403a</title>
      <description>hardcode type information for interpreted functions, so they do not need to be parsed at startup</description>
      <author>benibela</author>
      <pubDate>Wed Aug 18 08:24:58 2021 +0200</pubDate>
    </item>
  </channel>
</rss>

for $x in x:lines($raw) let $a:=tokenize($x,"\x09","!") iterates over each line and creates a variable which holds a sequence of all items (by splitting the line on the TAB-characters).

Reino
  • 3,203
  • 1
  • 13
  • 21