I'm trying to retrieve and work with data from historical versions of files in a git repo. I'd like to have something like a dictionary that holds <hash>, <time of commit>, <value retrieved from contents of a file revision>, <commit message>
for each entry.
I figured the data I retrieve from each file revision, and any calculations done with them, would be best handled using python. And the subprocess module appeared to be the best fit to integrate my git commands.
Below I show how I'm defining a function getval(key, filename)
that I had hoped would output <SHA-1 hash>:<Value>
to console, but would like to have a dict with more info... also with <time>
, and <commit message>
.
I help operate an ion accelerator, where we store 'savesets'--or values relevant to a given accelerator tune--using git. Of the values in these files, are things like charge(Q) and mass(A). Ultimately, I want to retrieve both values, get the ratio (Q/A), and display a list of file revision hashes sorted by the charge:mass ratio of the ion we delivered with the settings in that file's revision.
Sample of file (for 56Fe17+):
# Date: 2018-12-21 01:49:16.888 PV,SELECTED,TIMESTAMP,STATUS,SEVERITY,VALUE_TYPE,VALUE,READBACK,READBACK_VALUE,DELTA,READ_ONLY REA_EXP:LINE,0,1544047322.881066957,NO_ALARM,NONE,enum,"JENSA~[UDF;AT-TPC;GPL;JENSA]",,"---",,true REA_BTS19:BEAM:OPTICSFILE,0,1541798820.065952460,NO_ALARM,NONE,string,"BTS19_test3.data",,"---",,true REA_BTS19:BEAM:A_BOOK,0,1545322510.562031883,NO_ALARM,NONE,double,"56.0",,"---",,true REA_BTS19:BEAM:Z_BOOK,0,1545322567.544226340,NO_ALARM,NONE,double,"26.0",,"---",,true REA_BTS19:BEAM:Q_BOOK,0,1545322512.701768974,NO_ALARM,NONE,double,"17.0",,"---",,true
So far--and with the help of others here--I've figured out a git one-liner that greps the revision history of a given file for a key[a string] and uses sed and awk to output <hash>:<val associated with the key>
.
Git Oneliner I'm Starting with:
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp | sed 's/:/,/' | awk -F, '{print $1 ":" $8}'
Oneliner's Output
e78f73fe6f90e93d5b3ccf90975b0e540d12ce09:"56.0" 4b94745bd0a6594bb42a774c95b5fc0847ef2d82:"56.0" f2d5e263deac1d9112be791b39f4ce1b1b34e55d:"56.0" c03800de52143ddb2abfab51fcc665ff5470e363:"56.0" 4a3a564a6d87bc6ff5f3dc7fec7670aeecfe6a79:"58.0" d591941e51c4eab1237ce726a2a49448114b8f26:"58.0" a9c8f5cdf224ff4fd94514c33888796760afd792:"58.0" 2f221492beea1663216dcfb27da89343817b11fd:"58.0"
I've also started playing with the subprocess python module. But I'm struggling to figure out how to handle my more complicated git commands. Generally, I'll want to be able to pass a key, and a file.. something like getval(key, filename)
.
When my cmd string was ['git', 'grep', str, '$(git rev-list --all)', '--', pathspec], it returned errors stating that '$(git rev-list --all)' was ambiguous. Thinking it wasn't being expanded, I added a separate process to execute the nested command, but I'm not sure I'm doing this correctly.
My Python file (gitfun.py): which I'm currently running the function from
import sys, os
import subprocess
def getval(str, pathspec, repoDir='/mnt/d/stash.projects/rea'):
p1 = subprocess.Popen(["git", "rev-list", "--all"], stdout=subprocess.PIPE)
output, err = p1.communicate()
cmd = ['git', 'grep', str, output, '--', pathspec]
p2 = subprocess.Popen(cmd, cwd=repoDir)
p2.wait()
cwd = '/mnt/d/stash.projects/rea'
filename = 'ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp'
os.chdir(cwd)
getval('BTS19:BEAM:A_BOOK', filename)
Currently it is returning 'file name too long'
so (even though I'm not convinced it really is too long) I tried changing my core.longpaths in git config to true, however this had no effect. Again why I suspect I'm not handling my replacement of the $(git rev-list --all) expansion correctly.
For this code, I expect something that looks like this:
522628b8d3db01ac330240b28935933b0448649c:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true 2557c599d2dc67d80ffc5b9be3f79899e0c15a10:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true 7fc97ec2aa76f32265196c42dbcd289c49f0ad93:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true
...
But I ultimately want an output to console that looks identical to the git one-liner above, or better yet, a dict that I can print to console or do other things with.