I have a data structure that looks like this:
<client>: {
<document>: [
{'start': <datetime>,
'end': <datetime>,
'group': <string>}
]
}
The list of dictionaries within a <document>
is sorted by the 'start'
date, and a new entry cannot start before the one before it ends. I iterate over this data structure and I collect the values of group
as time progresses into a new structure, e.g.:
<client>: {
<document>: {'progression': <group_1>|<group_2>|...|<group_n>}
}
where <group_1>
corresponds to the value of 'group'
for the first dictionary in <document>
, and so on. I want to visualize this progression of groups
for all documents, so for example I know that I have 5,000 entries starting with "abc" (before the first pipe); out of those, 2,000 are followed by "def", so "abc"|"def"
. Of those, 500 revert back to "abc": "abc"|"def"|"abc"
and the remaining 1,500 are followed by "ghi": "abc"|"def"|"ghi"
. The remaining 3,000 entries starting with "abc" follow some different progression pattern.
What I am trying to do is visualize this progression via something looking like a Sankey diagram, or an other appropriate tree-like structure, in which the top node would be "abc", then there would be a "thick" branch to the left corresponding to the different progression pattern, and a "thinner" branch to the right corresponding to the 2,000 "abc" cases followed by "def". Then "def" would be another node with similar branches, one leading to a new "abc" (for the "abc"|"def"|"abc"
case) and one leading to "ghi" (for the "abc"|"def"|"ghi"
case), preferably annotated with the count in each node as the "tree" thins down. I use a combination of Python Counter
structures to retrieve the numbers for each potential progression, but I do not know how I can create a visualization programmatically.
My understanding is that it is probably a problem that can be addressed using dot language, and packages like pydot
and/or pygraphviz
, but I am not sure whether I am on the right track.