0

I'm trying to create a Sankey Diagram for the following dataset (only categorical variables), but I'm not having luck setting up the sankeyNetwork parameters (target, source, value). Find below my code.
Could you please help me clarify what is wrong here?

node_names <- unique(c(as.character(sk_dataset$Race), as.character(sk_dataset$Gender)))
nodes <- data.frame(name=node_names)


links <- data.frame(source=match(sk_dataset$Gender, node_names) -1,
                   target = match(sk_dataset$Race, node_names) -1,
                   value=c(2,3, 2, 3, 1, 3))

sankeyNetwork(Links=links, Nodes=nodes,Source="source",
             Target="target", Value="value") 


Example of what I want to achieve: Example

  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Do not share data as in image. We can't copy/paste that into R for testing. Also be sure to include all relevant `library()` calls in your sample code. – MrFlick Aug 12 '21 at 03:54

1 Answers1

1
library(data.table)
library(networkD3)

sk_dataset <- fread('sof/DT.csv')
sk_dataset

sk_dataset is something like that: (I took first 15 rows from image)

https://i.stack.imgur.com/cSBRl.png

Create a frequency table by gender and race.

t1 <- sk_dataset[,.N,by = c('gender','race')]

t1 frequency table looks like that:

gender race N
male black 3
male white 7
male hispanic 5
node_names <- unique(c(as.character(sk_dataset$race), as.character(sk_dataset$gender)))
nodes <- data.frame(name=node_names)


links <- data.frame(source=match(t1$gender, node_names) -1,
                    target = match(t1$race, node_names) -1,
                    value= t1$N)

sankeyNetwork(Links=links, Nodes=nodes,Source="source",
              Target="target", Value="value") 

Please review for more: https://www.r-graph-gallery.com/322-custom-colours-in-sankey-diagram.html

gokhan can
  • 189
  • 9
  • Thank you so much!! I had to create the frequency table using group_by and tally(), but the rest of the code worked perfectly. Could you please clarify how do I add links and nodes? As instance, if I wanted to add another node with Ideology. Thanks. – mndsnascimento Aug 12 '21 at 09:09
  • You're welcome, just don't forget accept the answer so everyone can use the answer. Have a nice one! – gokhan can Aug 12 '21 at 09:23
  • That I can say for you last question, source means 'from', target means 'to' and value shows thickness of the connections. We are convert values to index because plot is working with indexes. As i said for more information please review the link before I share. – gokhan can Aug 12 '21 at 09:40
  • Sorry, I'm new in here. Didn't know I had to accept answers. Just did. thanks! – mndsnascimento Aug 13 '21 at 11:02