3

plotly library has some nice sankey diagrams https://plotly.com/python/sankey-diagram/

but the data requires you to pass indexes of the source/target pairs.

    link = dict(
      source = [0, 1, 0, 2, 3, 3], # indices correspond to labels, eg A1, A2, A1, B1, ...
      target = [2, 3, 3, 4, 4, 5],

I was wondering if there's an API to simply pass a named list of these pairs?

links = [
    {'source': 'start', 'target': 'A', 'value': 2},
    {'source': 'A', 'target': 'B', 'value': 2},
...
]

this is more inline with how bokeh/holoviews expects data (but that sankey doesn't work with self-loops)

and also this pysankey widget

so i can closer map to my dataframe without processing everything?

or, is there a nice pythonic way to convert this in a one liner :D

dcsan
  • 11,333
  • 15
  • 77
  • 118

1 Answers1

8
  • the structure is clearly a pandas dataframe constructor format
  • create a dataframe from it, plus the key series of the nodes
  • from this it's simple to construct a Sankey plot from it
import pandas as pd
import numpy as np
import plotly.graph_objects as go

links = [
    {'source': 'start', 'target': 'A', 'value': 2},
    {'source': 'A', 'target': 'B', 'value': 1},
    {'source': 'A', 'target':'C', 'value':.5}

]

df = pd.DataFrame(links)
nodes = np.unique(df[["source","target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))

go.Figure(
    go.Sankey(
        node={"label": nodes.index},
        link={
            "source": nodes.loc[df["source"]],
            "target": nodes.loc[df["target"]],
            "value": df["value"],
        },
    )
)

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • nice! so you'd prefer `nodes = np.unique(df.loc[:,["source","target"]].values.ravel())` to something like a list comprehension on the keys? it's a bit hard to read for me with the `[:,` and `ravel` ... but i don't have a simpler alternative. – dcsan Oct 06 '21 at 11:51
  • https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html would do same thing. I've been working with pandas and numpy for quite sometime so am comfortable with the idioms... :-) maybe this is more readable `np.unique(df[["source","target"]].values.flatten())` for readability – Rob Raymond Oct 06 '21 at 13:00
  • `nodes = np.unique(df[["source","target"]], axis=None)` is even more succinct ... – Rob Raymond Oct 06 '21 at 14:04
  • much nicer, thanks! do you want to edit your answer? i've accepted anyway – dcsan Oct 06 '21 at 16:56