8

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then finally convert into final result hibernate object. In a nutshell something like POJO1 to POJO2 to POJO3.

In Java is there a way where I can deduce that an attribute from POJO3 was made/transformed from this attribute of POJO1. I want to look something where I can capture data flow from one model to another. This tool can be either compile time or runtime, I am ok with both.

I am looking for a tool which can run in parallel with code and provide data lineage details on each run basis.

Jon Goodwin
  • 9,053
  • 5
  • 35
  • 54
M.J.
  • 16,266
  • 28
  • 75
  • 97
  • You can put break-points and look at the data step by step – Kars Feb 21 '19 at 18:23
  • I want to capture this at service runtime, instead of doing debugging. In a nutshell, capture data lineage and data flow while execution of this logic. – M.J. Feb 22 '19 at 08:21
  • Static security analysis (SAST) tools do exactly this ( but not as part of normal program execution AFAIK). You may want to check out the technologies they use, Veracode is an example. – David Soroko Feb 24 '19 at 10:12

3 Answers3

2

Now instead of Pojos I will call them States! You are having a start position you iterate and transform your model through different states. At the end you have a final terminal state that you would like to persist to the database

stream(A).map(P1).map(P2).map(P3)....-> set of B

If you use a technic known as Event sourcing you can deduce it yes. How would this look like then? Instead of mapping directly A to state P1 and state P1 to state P2 you will queue all your operations that are necessary and enough to map A to P1 and P1 to P2 and so on... If you want to recover P1 or P2 at any time, it will be just a product of the queued operations. You can at any time rewind forward or rewind backwards as long as you have not yet chaged your DB state. P1,P2,P3 can act as snapshots.

This way you will be able to rebuild the exact mapping flow for this attribute. How fine grained you will queue your oprations, if it is going to be as fine as attribute level , or more course grained it is up to you.

Here is a good article that depicts event sourcing and how it works: https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224

UPDATE:

I can think of one more technic to capture the attribute changes. You can instument your Pojo-s, it is pretty much the same technic used by Hibernate to enhance Pojos and same technic profiles use to for tracing. Then you can capture and react to each setter invocation on the Pojo1,Pojo2,Pojo3. Not sure if I would have gone that way though....

Here is some detiled readin about the byte code instrumentation if https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

Alexander Petrov
  • 9,204
  • 31
  • 70
0

I would imagine two reasons, either the code is not developed by you and therefore you want to understand the flow of data along with combinations to convert input to output OR your code is behaving in a way that you are not expecting. I think you need to log the values of all the pojos, inputs and outputs to any place that you can inspect later for each run. Example: A database table if you might need after hundred of runs, but if its one time may be to a log in appropriate form. Then you need to yourself manually use those data values layer by later to map to the next layer. I think with availability of code that would be easy. If you have a different need pls. explain.

Please accept and like if you appreciate my gesture to help with my ideas n experience.

Mayank J
  • 71
  • 3
0

There are "time travelling debuggers". For Java, a quick search did only spill this out: Chronon Time Travelling Debugger, see this screencast how it might help you .

Since your transformations probably use setters and getters this tool might also be interesting: Flow

Writing your own java agent for tracking this is probably not what you want. You might be able to use AspectJ to add some stack trace logging to getters and setters. See here for a quick introduction.

chromanoid
  • 545
  • 3
  • 14