2

Problem

How can I change the height of each section/node of a Sankey diagram? I want to create something like Image 1 below where 'gender' section is small, then 'cause' section large and then 'age' section small again:

Image 1

My output is more like Image 2 where each section (Fuels, Sectors, End uses, Convertion devices) has the same height:

Image 2

Code:

library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

dfs <- dftest[ , c("Hospital", "Paciente", "Terapia", "Unit")]
alpha <- 1
getPalette <- colorRampPalette(brewer.pal(12, "Set3"))
colourCount <- length(unique(dfs$Hospital))
ggplot(dfs,
       aes(axis1 = Hospital, axis2 = Paciente, axis3=Terapia)) +
  geom_alluvium(aes(fill = Hospital), 
                width = 1/12, alpha = alpha, knot.pos = 0.5) +
  geom_stratum(width = 1/20) +
  scale_x_continuous(breaks = 1:3, labels = c("Hospital", "Patient", "Therapy")) +
  scale_fill_manual(values = getPalette(colourCount)) +
  ggtitle("Teste") +
  theme_minimal() +
  theme( legend.position = "none", panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(), axis.text.y = element_blank(),
         axis.text.x = element_text(size = 12, face = "bold"))

I thought I could create a sankey diagram similar to Image 1. Below you can find dput(dfs) for a made up dataset:

dput(dfs)
structure(list(Hospital = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", 
"2", "3", "4", "5"), class = "factor"), Paciente = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L
), .Label = c("21", "22", "23", "24", "25", "26", "27"), class = "factor"), 
    Terapia = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 
    1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("Adalimumab", 
    "Etanercept", "Infliximab", "Rituximab"), class = "factor"), 
    Unit = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), class = "data.frame", row.names = c(NA, 
-65L))

Can anyone please advise?

  • 1
    Must your answer be in ggplot? [Sankey diagrams in R](https://stackoverflow.com/questions/9968433/sankey-diagrams-in-r) lists tons of packages you might check to see if they support this. – smci Mar 10 '19 at 03:04
  • Thanks for your reply. I went through that list already and I tried different approaches but I believe this code might be giving me the best result for my problem. I was just wondering whether the code needs some extra steps in order to do what I would like and that's why I ask. Thanks. – Daniela Rodrigues Mar 10 '19 at 03:22
  • I don't understand: must your answer be in ggplot, or was that just an example and you will accept any answer in any package, but ggplot is preferred? (I don't know Sankey plots, so I can't answer) – smci Mar 10 '19 at 03:24
  • 1
    Can you include the output of `dput(dfs)` in your question? Reproducibility would go a long way in getting more people to take a stab at your question. – Z.Lin Mar 10 '19 at 04:15

1 Answers1

2

I think the ggalluvial package's geoms were not designed for free-floating sections. However, as its creator noted in the package vignette, the ggforce package has something similar, if the following look is what you are going for:

plot

Code used:

library(ggforce)

# transform dataframe into appropriate format
dfs2 <- gather_set_data(dfs, 1:3)

# define axis-width / sep parameters once here, to be used by
# each geom layer in the plot
aw <- 0.1
sp <- 0.1

ggplot(dfs2, 
       aes(x = x, id = id, split = y, value = Unit)) +
  geom_parallel_sets(aes(fill = Hospital), alpha = 0.3, 
                     axis.width = aw, sep = sp) +
  geom_parallel_sets_axes(axis.width = aw, sep = sp) +
  geom_parallel_sets_labels(colour = "white", 
                            angle = 0, size = 3,
                            axis.width = aw, sep = sp) +
  theme_minimal()

Here are some demonstrations with different parameter values:

demonstrations

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • @DanielaRodrigues It's hard to think of much without having sight of your data. Does this problem continue to occur if you trim down your actual dataset to fewer sections / centres / patients / therapies? If you can narrow down to a smaller subset of your dataset that causes the problem, it may be easier for you to find the issue yourself, or post here for others to try. – Z.Lin Mar 11 '19 at 02:16