0

I have seen this post 1 and Sankey plot in R to make a SAnkey plot but it is to complicated to understand !

If appreciate someone could explain how to make a Sankey plot for a data frame like below;

data.frame(row.names = paste0("SP","",1:30),
           COL1 = rep(sample(LETTERS[1:3]),10),
           COL3 = rep(sample(LETTERS[1:3]),10),
           COL3 = rep(sample(LETTERS[1:3]),10)
           )

So I can visualize the values 3 bars representing columns with 3 segments representing 3 factors "A", "B" and "C".

s__
  • 9,270
  • 3
  • 27
  • 45
Seymoo
  • 177
  • 2
  • 15
  • Can you put more details on your desired output? – cirofdo May 24 '18 at 17:23
  • @TheBiro I was data.frame where each row name is from one samples and values in each column are prediction of a various classifier (assume 3 possible outcomes) on the same sample. I want to show the proportion of each outcome for each column and their connection to the results from other columns. It should be like `table(data.frame$COL1,data.frame$COL2)` but in a visual way and directions – Seymoo May 24 '18 at 17:53

1 Answers1

1

let's try:

# load some packages!
library(alluvial)
library(ggalluvial)
require(ggplot2)
library(reshape)

# put the seed, to have consistency in random sample
set.seed(1)
data <- data.frame(row.names = paste0("SP","",1:30),
           id = paste0("SP","",1:30),              # added the rowlabels as id
           COL1 = rep(sample(LETTERS[1:3]),10),
           COL2 = rep(sample(LETTERS[1:3]),10),    # rename a column
           COL3 =rep(sample(LETTERS[1:3]),10))


# put data in long format, for ggplot
mdata <- melt(data, id=c("id")) 


# here the sankey, if it's like you want it
ggplot(mdata,aes(x = variable, stratum = value, alluvium = id, fill = value, label = value))+
  geom_flow(stat = "alluvium", lode.guidance = "rightleft") +
  geom_stratum() + geom_text(stat = "stratum")+
  theme(legend.position = "none") +
  ggtitle("test")



Here the result.

enter image description here

s__
  • 9,270
  • 3
  • 27
  • 45