1

I'm trying to figure out whether a git version controlled repository of text files can theoretically serve as a paper lab notebook replacement for intellectual property purposes.

In a paper notebook, the way this works is you write down your results every day, and one or two people in your lab sign off on the notes and cross out any white space. In theory, this is supposed to indicate that you have a perfect record of what you did that day, and you can't add any stuff to it after the date.

The way I'm thinking this could be implemented in git is by having a repo with the experimental results (i.e. lab notebook) that gets pushed to a private shared repo on github, and ... somehow two other people check off on it? (Suggestions on how to do this?)

The main caveat is that is it possible (and, if yes, what's the code needed to run) to completely change the contents and timestamp of a particular text file - without leaving a trace in the overall commit history?

dvanic
  • 545
  • 1
  • 4
  • 17
  • The requirement with which I'm struggling is the persistence of all data from one commit to the next. Normally, Git allows any part of a file to be modified in a commit, and so experimental data could be deleted by a reviewer, either intentionally or unintentionally. – Tim Biegeleisen Feb 05 '16 at 05:05
  • In other words, a page from a lab notebook actually has state from _several_ "commits" (originator, reviewers, and modifications), whereas in Git a single commit represents a page in a single state. – Tim Biegeleisen Feb 05 '16 at 05:07
  • @TimBiegeleisen I'm a bit confused about your question. So in a perfect world (and theoretically in the current paper-based world), a lab notebook has a commit for every day by the originator. The reviewers just do the equivalent of confirming the the timestamp with their signature. – dvanic Feb 05 '16 at 06:11
  • @TimBiegeleisen In a digital, more reasonable world, what I would think of doing would be to follow that commit-confirm pattern for most notes, but in situations where I made a mistake (or might not have clarified everything about my experimental setup as I should have) I can edit the file for that particular date, and then have the new timestamp confirmed by the reviewers again, with the one for that day. – dvanic Feb 05 '16 at 06:14
  • Have a look at the answers given below. Git can probably simulate how a lab notebook would behave, but each commit won't necessarily have the full history of what happened in the lab. Rather, you will have to look at groups of commits to achieve that. – Tim Biegeleisen Feb 05 '16 at 06:17
  • My question pertains to whether down the track, if I want to for nefarious purposes, for example, I can modify the history somehow to change the contents and datestamp. – dvanic Feb 05 '16 at 06:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102658/discussion-between-tim-biegeleisen-and-dvanic). – Tim Biegeleisen Feb 05 '16 at 06:20

2 Answers2

1

Each git commit contains a link back to the previous git commit (see Git Internal - Commit Objects. If you change a commit in the past you have change every commit that came after that. This will cause issues when you push updated commits.

So it is possible to rewrite the history in git. However with a suitably setup git server you should be have some confidence that the history can't be changed. Something like this question.

Community
  • 1
  • 1
0

The thing you are aiming for is indeed possible, and it is at most as secure as SHA-1 itself.

The way to implement this is through digital signatures (probably via annotated tags) that act as the signer's endorsement of a particular SHA-1 ID. That SHA-1 would be the SHA-1 for the most recent update commit, which (via git's usual Merkle tree of commit IDs) automatically covers all previous SHA-1s as well (not that this matters unless each day is stored separately).

The usual annotated tag signatures use GPG keys, which are likely more secure than the SHA-1s themselves. SHA-1 was originally intended to be quite resistant against what is called a second preimage (basically this means that an attacker knows both the original message and the signature; the attacker then comes up with a second, different message that has the same digital signature). Some attack methods were discovered in 2005, though, and as of 2014 SHA-1 was de-certified for US Federal digital signature purposes.

In any case using git is not required: you can simply have your certifiers store signatures for each day's file(s). The signatures themselves can be stored however you like (perhaps via git); they are independent of any versioning. Using SHA-256 or something stronger is probably wise though.

torek
  • 448,244
  • 59
  • 642
  • 775