I want to generate a Sankey Diagram from product data looking like this.
id begin_date status
1 01.02.2020 a
1 10.02.2020 b
1 17.02.2020 c
2 02.02.2020 d
2 06.03.2020 b
2 17.04.2020 c
For your experimentation:
pd.DataFrame([[1, '2020-02-01', 'a'], [1, '2020-02-10', 'b'], [1, '2020-02-17', 'c'], [2, '2020-02-02', 'd'], [2, '2020-03-06', 'b'],[2, '2020-04-17', 'c']], columns=['id', 'begin_date', 'status'])
After looking at this explanation: Draw Sankey Diagram from dataframe I want to construct the "Source-Target-Value"-Dataframe looking like this. To improve understanding, I did not convert Source and Target to integers.
# with Source = previous status
# with Target = next status
# with Value = count of IDs that transition from Source to Target
Source Target Value Link Color
a b 1 rgba(127, 194, 65, 0.2)
b c 2 rgba(127, 194, 65, 0.2)
d b 1 rgba(211, 211, 211, 0.5)
The problem lies in generating Source, Target, and Value.
The Source
and Target
should be the status transition from a
to b
. The Value
is the count of id
s doing that transition.
What is the best way to do this?
EDIT: Using an online generator, the result would look like this: