I work in an industry where lots of data are processed in small steps. Typically this takes the form of some input data (e.g. .csv file) being analyzed (let's say using an .exe) to produce some more data (e.g. a new .csv file). This is a bit generalized - in reality the input files could be several GB, and the process could just as easily be a python or R script. We can also assume that all relevant input information can be captured in text or other input files, so that I do not need to capture information about mouse clicks or keyboard inputs.
In this industry, those processes have value: customers are generally paying for expensive, cutting-edge processing. Processes are updated regularly. These updates range from changing 1 character to throwing out the whole previous process, and the impact on results could also be quite large.
It is therefore very important for end users to know which version of the process was used and to be able to confirm from the data that they receive, that I really used the expensive, up-to-date process that I claim to have used.
So, how can I prove to a customer that I really used my expensive processing routines?
I know I could provide them with a Git repository's unique ID or hash of each file (input .csv file, process .exe, and the output .csv file), and I could share that with the customer to say "these are the files that I used".
But, there's no guarantee that I really used that input data and analysis code to carry out this process. Hashing and git IDs are not dependent on actually running the code. I could have used something computationally cheaper and just claim to have used the fancy process.
I think that this processing and verification could be supported by blockchain technology, but I need some pointers there.
My question: is there some way that I can leverage Git, hashing, blockchain and/ or other technologies to show a customer that I actually used the modern process that I am selling them to arrive at an end result, and not actually a much cheaper or older approach?