3

I have a data frame with common features between 4 groups of patients and cell types. I have a lot of different features, but the shared ones (present in more than 1 group) are just a few.

I want to make a circos plot that reflects the few connections between shared features across groups of patients and cell types, while giving an idea of how many unshared features there are in each group.

The way I think of it, it should be a plot with 4 sectors (one for each group of patient and cell type) with a few connections between them. Each sector size should reflect the total number of features in the group, and most of this area should not be connected to other groups, but empty.

This is what I have so far, but I do not want sectors dedicated to each feature, only to each group of patient and cell type.

MWE:

library(circlize)

patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat

chordDiagram(as.data.frame(dat), transparency = 0.5)

fig1

EDIT!!

What @m-dz shows in his answer is actually the format I am looking for, 4 sectors for the 4 different patient/cell.type combinations, showing just the connections, while the non-connected features, though not shown, should account for the size of the sector.

However, I am realizing I have a more complicated scenario than the one in the MWE above.

A feature is considered to appear in 2 patient/cell.type groups, not only when it is identical in the 2 groups, but also when it is similar... (sequence identity above a threshold). This way, I have redundancies...

Feature A in patient1-cell1 can be connected to feature A in patient2-cell1, but also to feature B... Feature A should only be counted once (unique counts) for patient1-cell1, and expand to 2 different features in patient2-cell1.

See below an example of how my actual data looks like more precisely, and see if working with this example we can get the final circos plot! Thanks!!

##MWE
#NON OVERLAPPING SETS!

#1: non-shared features
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
nonshared

#2: features shared between cell types within same patient
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedcells

#3: features shared between patients within same cell types
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedpats

#4: features shared between patients and cell types
#4.1: shared across pat1-cell1, pat1-cell2, pat2-cell1, pat2-cell2
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
#4.2: shared across pat1-cell1, pat1-cell2, pat2-cell1
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
#4.3: shared across pat1-cell1, pat1-cell2, pat2-cell2
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
#4.4: shared across pat1-cell1, pat2-cell1, pat2-cell2
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
#4.5: shared across pat1-cell2, pat2-cell1, pat2-cell2
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
#4.6: shared across pat1-cell1, pat2-cell2
sharedall6 <- data.frame()
#4.7: shared across pat1-cell2, pat2-cell1
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))

sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
sharedall
#you see there might be overlaps between the different subsets of sharedall, but not between sharedall, sharedparts, sharedcells, and nonshared

#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)

#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2     cell2   c3||c3
#4 pat1||pat2     cell2   c3||c4
#5 pat1||pat2     cell2   c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2
DaniCee
  • 2,397
  • 6
  • 36
  • 59
  • I don't know `circlize` but it sounds as if doing `dplyr::summarize(dat, n_features = length(unique(features)))` might help –  Mar 21 '17 at 06:11
  • could you elaborate? I don't see where that would help – DaniCee Mar 21 '17 at 07:48
  • It sounds as if all you care about is the number of features, and you want to plot patients and cell types, rather than features. I guess I forgot the `group_by(patients, cell.types)`. –  Mar 21 '17 at 08:49
  • I still do not see how this would help in my case... could you write a full example? – DaniCee Mar 24 '17 at 02:08
  • Wow, those new data are a mess! Could you extract them when they are in the state of `nonshared` from the first line? I.e. with simple `patient, cell.type, feature` for every combination? If not, some string processing needs to be used and we are good to go. PS: I have the answer, just not where I am now unfortunately. Will share within ~12 hours. – m-dz Mar 28 '17 at 08:15
  • I guess that in order to make a unique data frame in the form of nonshared, I would need an extra column with unique identifiers for the groups in the other data frames, cause otherwise if I split the features in these groups I might lose the connection (you see that sometimes features in a group are not exactly the same); let me get a data frame like that and see how to move from there. Many thanks! – DaniCee Mar 29 '17 at 01:53

3 Answers3

5

A side note about the expected outcome: aim was to create a plot showing simply how much features are shared, ignoring individual features (1st plot below) or shared features overlap (e.g. on the 2nd plot it looks like the same features are shared between all groups, which is not true looking at the 1st plot, but what is important here is the ratio of features shared between groups).

Code below produces the following two figures (fig. 1 left for reference):

All individual features

enter image description here

Simple counts of unique and shared features

enter image description here

One of those should meet the expectations.

# Prep. data --------------------------------------------------------------

nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
sharedall6 <- data.frame()
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)

#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)

#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2     cell2   c3||c3
#4 pat1||pat2     cell2   c3||c4
#5 pat1||pat2     cell2   c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2



# Start -------------------------------------------------------------------

library(circlize)
library(data.table)
library(magrittr)
library(stringr)
library(RColorBrewer)

# Split and pad with 0 ----------------------------------------------------
fun <- function(x) unlist(tstrsplit(x, split = '||', fixed = TRUE))

nonshared %>% setDT()
sharedcells %>% setDT()
sharedpats %>% setDT()
sharedall %>% setDT()

nonshared <- nonshared[, .(group = paste(patient, cell.type, sep = '-'), feature)][, feature := paste0('a', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedcells <- sharedcells[, lapply(.SD, fun), by = 1:nrow(sharedcells)][, .(group = paste(patient, cell.types, sep = '-'), feature = features)][, feature := paste0('b', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedpats <- sharedpats[, lapply(.SD, fun), by = 1:nrow(sharedpats)][, .(group = paste(patients, cell.type, sep = '-'), feature = features)][, feature := paste0('c', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedall <- sharedall[, lapply(.SD, fun), by = 1:nrow(sharedall)][, .(group = both, feature = features)][, feature := paste0('d', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]

dt_split <- rbindlist(
  list(
    nonshared,
    sharedcells,
    sharedpats,
    sharedall
  )
)

# Set key and self join to find shared features ---------------------------
setkey(dt_split, feature)
dt_join <- dt_split[dt_split, .(group, i.group, feature), allow.cartesian = TRUE] %>%
  .[group != i.group, ]

# Create a "sorted key" ---------------------------------------------------
# key := paste(sort(.SD)...
# To leave only unique combinations of groups and features
dt_join <-
  dt_join[,
          key := paste(sort(.SD), collapse = '|'),
          by = 1:nrow(dt_join),
          .SDcols = c('group', 'i.group')
          ] %>%
  setorder(feature, key) %>%
  unique(by = c('key', 'feature')) %>%
  .[, .(
    group_from = i.group,
    group_to = group,
    feature = feature)]

# Rename and key ----------------------------------------------------------

dt_split %>% setnames(old = 'group', new = 'group_from') %>% setkey(group_from, feature)
dt_join %>% setkey(group_from, feature)



# Individual features -----------------------------------------------------

# Features without connections --------------------------------------------

dt_singles <- dt_split[, .(group_from, group_to = group_from, feature)] %>%
  .[, N := .N, by = feature] %>%
  .[!(N > 1 & group_from == group_to), !c('N')]

# Bind all, add some columns etc. -----------------------------------------

dt_bind <- rbind(dt_singles, dt_join) %>% setorder(group_from, feature, group_to)

dt_bind[, ':='(
  group_from_f = paste(group_from, feature, sep = '.'),
  group_to_f = paste(group_to, feature, sep = '.'))]
dt_bind[, feature := NULL]  # feature can be removed

# Colour
dt_bind[, colour := ifelse(group_from_f == group_to_f, "#FFFFFF00", '#00000050')]  # Change first to #FF0000FF to show red blobs

# Prep. sectors -----------------------------------------------------------

sectors_f <- union(dt_bind[, group_from_f], dt_bind[, group_to_f]) %>% sort()

colour_lookup <-
  union(dt_bind[, group_from], dt_bind[, group_to]) %>% sort() %>%
  structure(seq_along(.) + 1, names = .)
sector_colours <- str_replace_all(sectors_f, '.[a-d][0-9]+', '') %>%
  colour_lookup[.]

# Gaps between sectors ----------------------------------------------------

gap_sizes <- c(0.0, 1.0)
gap_degree <-
  sapply(table(names(sector_colours)), function(i) c(rep(gap_sizes[1], i-1), gap_sizes[2])) %>%
  unlist() %>% unname()
# gap_degree <- rep(0, length(sectors_f))  # Or no gap



# Plot! -------------------------------------------------------------------

# Each "sector" is a separate patient/cell/feature combination

circos.par(gap.degree = gap_degree)
circos.initialize(sectors_f, xlim = c(0, 1))
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = sector_colours, bg.border = NA)

for(i in 1:nrow(dt_bind)) {
  row_i <- dt_bind[i, ]
  circos.link(
    row_i[['group_from_f']], c(0, 1),
    row_i[['group_to_f']], c(0, 1),
    border = NA, col = row_i[['colour']]
  )
}

# "Feature" labels
circos.trackPlotRegion(track.index = 2, ylim = c(0, 1), panel.fun = function(x, y) {
  sector.index = get.cell.meta.data("sector.index")
  circos.text(0.5, 0.25, sector.index, col = "white", cex = 0.6, facing = "clockwise", niceFacing = TRUE)
}, bg.border = NA)

# "Patient/cell" labels
for(s in names(colour_lookup)) {
  sectors <- sectors_f %>% { .[str_detect(., s)] }
  highlight.sector(
    sector.index = sectors, track.index = 1, col = colour_lookup[s],
    text = s, text.vjust = -1, niceFacing = TRUE)
}

circos.clear()



# counts of unique and shared features ------------------------------------

xlims <- dt_split[, .N, by = group_from][, .(x_from = 0, x_to = N)] %>% as.matrix()
links <- dt_join[, .N, by = .(group_from, group_to)]
colours <- dt_split[, unique(group_from)] %>% structure(seq_along(.) + 1, names = .)

library(circlize)

sectors = names(colours)
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(sectors, xlim = xlims)
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = colours, bg.border = NA)

for(i in 1:nrow(links)) {
  link <- links[i, ]
  circos.link(link[[1]], c(0, link[[3]]), link[[2]], c(0, link[[3]]), col = '#00000025', border = NA)
}

# "Patient/cell" labels
for(s in sectors) {
  highlight.sector(
    sector.index = s, track.index = 1, col = colours[s], 
    text = s, text.vjust = -1, niceFacing = TRUE)
}

circos.clear()

Edit: Just adding the link from removed comment: See this answer for a nice example of labelling!

Community
  • 1
  • 1
m-dz
  • 2,342
  • 17
  • 29
  • Ok this seems to be what I need; but I am still struggling to obtain a data frame like the one you use, from the data frame I use in my MWE... if you could include that step and make the solution more tailored to the data I presented, I accept this answer. Thanks! – DaniCee Mar 25 '17 at 12:27
  • Ok, this looks like the exact format I am looking for. I am checking my actual data, and might have a more difficult scenario... I have cases where the same feature in a group can connect to different features in another group, so there is not 1 to 1 relation, but 2 to 1 in some cases; so we might be looking at a origin->destination kind of thing. I will edit tomorrow to reflect this, and see if it can be reflected in the plot... – DaniCee Mar 27 '17 at 10:04
  • many thanks @m-dz! Please check my EDIT that shows more precisely how my actual data looks like. Thanks! – DaniCee Mar 28 '17 at 02:41
  • Thanks so much!! I don't have time to look at this now, will do tomorrow or Friday (Singapore time), but at a quick sight I just have a couple of questions. The plot is not reflecting the non-shared features, is it? I expect just few connections coming out of wide sectors, like the result you got before. And the other question is what does the red color represent? Thank you again! – DaniCee Mar 29 '17 at 09:49
  • Exactly, just show the shared features between the sectors – DaniCee Mar 29 '17 at 10:39
  • but the non-shared features should still be counted to make up the size of a sector (if we graduate a sector by number of features on it, say for example patient1-cell1 has 30 features, but only 12 are shared with other sectors; the size of the sector itself would be 30, but the connections I want to see in the inner part of the plot are 12) – DaniCee Mar 29 '17 at 10:41
  • "nonshared" is the biggest data frame, so there should be more empty spaces than connections shown I guess – DaniCee Mar 29 '17 at 10:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139353/discussion-between-m-dz-and-danicee). – m-dz Mar 29 '17 at 10:58
  • Just to clarify, we have agreed with @DaniCee to clean up the comments. – m-dz Mar 31 '17 at 14:01
  • Hi @m-dz! I am now having the time to really adapt my own data to all this, and I am spotting inaccuracies in the plots in your answer... can we open another chat and talk about it? For example, if you check the original "sharedall" data frame, you would spot this line: "pat1-cell1||pat1-cell2||pat2-cell2 d8||d8||d9" that means feature d8 in pat1-cell1 and in pat1-cell2 should be connected to d9 in pat2-cell2, however, I don't see d9 in pat2-cell2 in the first plot (I assume it's lost in the second plot too), and like this, with all cases were non-identical features are connected – DaniCee Apr 03 '17 at 03:22
  • Remember when I said that the connections are not always between identical features... that's why I said we need a third column specifying the connection if you need the data frames in a unique feature basis... Can you open another chat so we can discuss it if you have some time? Many many thanks! – DaniCee Apr 03 '17 at 03:22
  • So these plots do not seem to be reflecting any of the connections between non-identical features in the original files... Do you see what I mean? Let me know! Thanks – DaniCee Apr 03 '17 at 03:36
  • Okay, I see it now. I will have a look at this in ~5h from now. – m-dz Apr 03 '17 at 14:21
  • @DaniCee, could you prepare a smaller example (like `"pat1-cell1||pat1-cell2||pat2-cell2 d8||d8||d9"` and 4 other rows) and sketch your desired plot? If you only care about counts, maybe you can simply rename the features `"pat1-cell1||pat1-cell2||pat2-cell2 d8||d8||d9"` --> `"pat1-cell1||pat1-cell2||pat2-cell2 d001||d001||d001"` and use the above code? – m-dz Apr 04 '17 at 21:29
  • I am not sure I am getting what you mean... could you open a new chat to discuss it? – DaniCee Apr 05 '17 at 01:22
  • Let's use [the old chat](http://chat.stackoverflow.com/rooms/139353/discussion-between-m-dz-and-danicee). – m-dz Apr 05 '17 at 06:39
3

@m-dz provides a correct direction. I can provide more details regarding to your simulated data.

Let's start from here:

patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))

as.data.frame converts dat to a three-column data frame (i.e. an adjacency list where links start from the first column and point to the second column)

dat = as.data.frame(dat, stringsAsFactors = FALSE)

Generate colors for patients/cells and features.

features = unique(dat[[2]])
features_col = structure(rand_color(length(features)), names = features)
patients_col = structure(2:5, names = unique(dat[[1]]))

If a feature only exists in one patient/cell combination, you don't want to show it but still want to keep its position in the plot, you can just set #FFFFFF00 as its color (a white color with full transparency so that it will not cover other links). Here we want the link color to be same as the feature sectors.

col = ifelse(dat[[3]], features_col[dat[[2]]], "#FFFFFF00")
col = gsub("FF$", "80", col) # half transparent
features_count = tapply(dat[[3]], dat[[2]], sum)
# set color to white if it only exists in one patient/cell
col[features_count[dat[[2]]] == 1] = "#FFFFFF00" 

And the final chord diagram:

chordDiagram(dat, col = col, grid.col = c(features_col, patients_col))

You can see in the feature sectors there are at least two links pointing to patients/cells.

enter image description here

Zuguang Gu
  • 1,301
  • 7
  • 11
  • plus one for working with my actual MWE data, but I would need something more like what @m-dz proposes, just 4 sectors connecting the number of features in common... could you edit to reflect that? Thanks! – DaniCee Mar 25 '17 at 14:51
  • @Zuguang Gu i have been struggling with this circos plot i tried the circos written in perl ,my knowledge is not enough to implement what i want to show your package is preety powerful but im not sure how to implement it my question with all the data files https://stackoverflow.com/questions/54092019/circos-plot-issue-with-genes-and-patient-mutation-frequency i would be glad if you can look and suggest me something to start with.. – kcm Jan 17 '19 at 13:57
1

get data prepared

    library(circlize)
    patients <- c(rep("patient1",20), rep("patient2",10))
    cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
    features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"),     paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
    dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
    dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
    dat<-as.data.frame(dat,stringsAsFactors = FALSE)

enter image description here

get all combination of patient and celltypes

    df=NULL
    for(i in levels(as.factor(dat$feature))){
        temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
        temp$feature=i
        temp$Freq=1
        Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
        for(j in Freq_0){
          temp$Freq[temp$V1==j | temp$V2==j]=0
        }
        df<-rbind(df,temp)
    }

enter image description here

add color

    df$color=rainbow(dim(df)[1])
    df[which(df$Freq==0),5]="white"
    df$Freq=1
    chordDiagram(df[,c(-3,-5)], transparency = 0.5,col = df$color)

different link means different feature,and the link color is white where 'Freq' is 0 enter image description here

I turn the color 'white' into 'black' while black is more conspicuous enter image description here

If you want to left behind the 'feature' attribute...... let's get data prepared first

    library(circlize)
    patients <- c(rep("patient1",20), rep("patient2",10))
    cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
    features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
    dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
    dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
    dat<-as.data.frame(dat,stringsAsFactors = FALSE)
    df=NULL
    for(i in levels(as.factor(dat$feature))){
      temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
      temp$feature=i
      temp$Freq=1
      Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
      for(j in Freq_0){
        temp$Freq[temp$V1==j | temp$V2==j]=0
      }
      df<-rbind(df,temp)
    }

processed it

    library(dplyr)
    df1<-subset(df,df$Freq==1)
    df0<-subset(df,df$Freq==0)
    df1_mod<-summarise(group_by(df1,V1,V2),Freq=n())
    df0_mod<-summarise(group_by(df0,V1,V2),Freq=n())

add color

    df1_mod$color<-rainbow(5)
    df0_mod$color<-"white"
    df_res<-rbind(df0_mod,df1_mod)

enter image description here

draw it

chordDiagram(df_res, transparency = 0.5,col = df_res$color)

enter image description here

enter image description here

These picture shows there are many zeros in the 'Freq'.

Liun
  • 117
  • 2