1

Problem: parse the output of git log, which has a structured commit message, into an object.

This is what the log looks like for this particular directory:

commit 7df03ac69f27f80887cc588ab7bec7e38a42d3db
Author: John Doe <johndoe@yahoo.com>
Date:   Thu Apr 6 19:43:18 2017 +0200

    VAL_A "Something"
    VAL_B "Goodbye"
    OPTIONALVAL_1 "Hello World \n"

commit 9d9c69f2798778yyuyuu6786767tc7e38a42d3db
Author: John Doe <johndoe@yahoo.com>
Date:   Thu Apr 6 19:43:18 2017 +0200

    VAL_A "Hello World"
    VAL_B "Goodbye World"
    OPTIONALVAL_2 "Hello again World \n"

commit 666669f2798778yyuyuu6786767tc7e38a42d3db
Author: John Doe <johndoe@yahoo.com>
Date:   Thu Apr 6 19:43:18 2017 +0200

    VAL_A "Hello World"
    VAL_B "Goodbye World"

Where each commit is to be parsed into an object which has member variables git_commit_hash, VAL_A, VAL_B, OPTIONALVAL_1, and OPTIONALVAL_2. The optional values can be empty but the VAL_A and VAL_B cannot.

My approach:

  1. Dump the git log output into a temporary file.
  2. Read the file line by line, if it starts with the word "commit" then save the characters after the space into git_commit_hash of a new object.
  3. Skip the next three lines.
  4. Save the two mandatory VAL_A and VAL_B values.
  5. Since the values may overflow to the next line, just keep going down, check if either OPTIONALVAL_1 is set and/or OPTIONALVAL_2 at the beginning of the line, and if so, save it.
  6. Stop parsing the current object once the word "commit" is reached. Create a new object then repeat 1-5.

This is a brute force approach which works somewhat, but has no flexibility. I was wondering if anyone could point me towards a more elegant solution or a c++ or boost library. Thanks.

  • 1
    How about creating a struct or class that holds all the information, no skipping anything. Then it is as simple as picking out of the struct the information you need. One pass reading the file into the array, vector, list, whatever of the structs, and then everything is memory based, not disk based. – PaulMcKenzie Apr 06 '17 at 18:08

1 Answers1

2

you could use your own formatting of the git log using pretty formats like this:

git log --pretty=format:"<your formatting>"

If you put each element in an easy to parse way (e.g. define an xml with the data you need and then use boost::property_tree to extract the data) you could easily read the file and exactly know which information is where without parsing the whole log.

More info in this thread: Git log output to XML, JSON, or YAML?

Community
  • 1
  • 1
Rogus
  • 750
  • 5
  • 11