2

So I've created a way to programmatically define the source, target and value lists for the sankey diagrams in plotly starting with a list of dictionaries. So if you were looking for a way to do that here it is.

However, I'm stuck on figuring out a way to define the labels using a similar method.

Any help appreciated.

my_data = [
{'src':'wages','dst':'budget', 'value':1500},
{'src':'other','dst':'budget', 'value':250},
{'src':'budget','dst':'taxes', 'value':450},
{'src':'budget','dst':'housing', 'value':420},
{'src':'budget','dst':'food', 'value':400},
{'src':'budget','dst':'transportation', 'value':295},
{'src':'budget','dst':'savings', 'value':25},
{'src':'budget','dst':'other necessities', 'value':160},
]

i = 0 
node_names = []
my_data2 = []
for row in my_data:
    key_src = row['src']
    if (key_src not in node_names):
        node_names.append(key_src)
        i = i + 1
    row['src_id'] = i
    my_data2.append(row)

for row in my_data:
    key_dst = row['dst']
    if (key_dst not in node_names):
        node_names.append(key_dst)
        i = i + 1
    row['dst_id'] = i
    my_data2.append(row)
    
del node_names 

my_data2 = [dict(t) for t in {tuple(d.items()) for d in my_data2}] # Remove duplicates 


source = []
target = []
value = []

for row in my_data2:
    source.append(row['src_id'])
    target.append(row['dst_id'])
    value.append(row['value'])
    

print(source)
print(target)
print(value)


import plotly.graph_objects as go

link = dict(source = source, target = target, value = value)
data = go.Sankey(link = link)


# data
label = ["ZERO", "ONE", "TWO", "THREE", "FOUR", "FIVE"]
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()

1 Answers1

1

This may be a way to limit your data. We came up with the idea of using the original dictionary format data as a data frame to create the labels. You get a list of unique strings for the starting point and a list of unique strings for the ending point and join the lists together. The overlapping string is the label for the center point. We use set() to resolve this duplication and still maintain the original list order. Finally, an empty string is inserted at the beginning.

import pandas as pd

df = pd.DataFrame.from_dict(my_data)
df

    src     dst     value   src_id  dst_id
0   wages   budget  1500    1   3
1   other   budget  250     2   3
2   budget  taxes   450     3   4
3   budget  housing     420     3   5
4   budget  food    400     3   6
5   budget  transportation  295     3   7
6   budget  savings     25  3   8
7   budget  other necessities   160     3   9


src_dst = list(df['src'].unique()) + list(df['dst'].unique())
labels = sorted(set(src_dst), key=src.index)
labels.insert(0,'')

labels
['',
 'wages',
 'other',
 'budget',
 'taxes',
 'housing',
 'food',
 'transportation',
 'savings',
 'other necessities']

import plotly.graph_objects as go

link = dict(source = source, target = target, value = value)
data = go.Sankey(link = link)
    
# data
#label = ["ZERO", "ONE", "TWO", "THREE", "FOUR", "FIVE"]
label = labels
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32
  • This is a much better idea than what I was doing. But when I run your code I get this. Did you mean to make a dataframe called src? labels = sorted(set(src), key=src.index) NameError: name 'src' is not defined – PinAppleRedbull Aug 28 '22 at 05:51
  • The code was incorrect and has been corrected. The name src_dst, which combines each list, is correct. – r-beginners Aug 28 '22 at 07:02