1

I have this data and I want to create a stacked Sankey Diagram using ggplot. I want to try and recreate it and look like the following picture. What's the best way to go about it?

Risk Factors for Stroke             1990    1995    2000    2005    2010
Obesity                                 0.001   0.013   0.043   0.077   0.115
Diabetes                            0.359   0.316   0.26    0.187   0.092
Smoking                                 0.171   0.156   0.142   0.128   0.116
Hypercholesterolemia                    0.161   0.104   0.045   0.001   0.001
Hypertension                            0.654   0.633   0.602   0.561   0.509

I want to recreate this diagram with the data enter image description here

I tried this so far but I don't think that will make my data the way I want it to.

D2 <- Datatable1 %>% make_long(`Risk Factors for Stroke in Blacks`, `1990`, `1995`, `2000`, `2005`, `2010`)
D2
Wimpel
  • 26,031
  • 1
  • 20
  • 37
ElHombre
  • 23
  • 4
  • 1
    Welcome to SO. Besides that single line of code, that attempts at reshaping your data, what else have you tried? I think you can check data.table: `longData = data.table::melt(Datatable1, 1)` to reshape your data. From there you have a long way to go if you want a sankey diagram. – PavoDive Dec 30 '22 at 18:39
  • This recent answer to a similar question might be helpful: https://stackoverflow.com/a/74964023/20513099 – I_O Dec 30 '22 at 18:42

1 Answers1

3

this looks close enough to get you started...

library(data.table)
library(ggplot2)
library(ggalluvial)
# read sample data
DT <- fread('"Risk Factors for Stroke"             1990    1995    2000    2005    2010
Obesity                                 0.001   0.013   0.043   0.077   0.115
Diabetes                            0.359   0.316   0.26    0.187   0.092
Smoking                                 0.171   0.156   0.142   0.128   0.116
Hypercholesterolemia                    0.161   0.104   0.045   0.001   0.001
Hypertension                            0.654   0.633   0.602   0.561   0.509', header = TRUE)
# create workable column-names
setnames(DT, janitor::make_clean_names(names(DT)))
# melt to long format
DT.melt <- melt(DT, id.vars = "risk_factors_for_stroke")
# create variable for sorting the riks by value
DT.melt[order(-value, variable), id := factor(rowid(variable))]
# create plot
ggplot(data = DT.melt, 
       aes(x = variable, y = value,
           stratum = id, 
           alluvium = risk_factors_for_stroke, 
           fill = risk_factors_for_stroke, 
           colour = id,
           label = value)) + 
  geom_flow(stat = "alluvium", lode.guidance = "frontback",
            color = "white") +
  geom_stratum(color = "white", width = 0.7) +
  geom_text(position = position_stack(vjust = 0.5), colour = "white")

enter image description here

Wimpel
  • 26,031
  • 1
  • 20
  • 37