0

My table is like this:

  ID   feature v1    v2   v3
  1     mpg    
  1     ap     0.8
  1     cs     0.7   0.8
  1     de     0.2   0.6  1.2
  2     A
  2     B      0.5
  2     C      0.3   0.4  

1. The logic is column Permutations, for example: 0.8= mpg+ap, 0.7=mpg+cs, 0.2=mpg+de, 0.6=ap+de, 0.5=A+B

2. ID is differentiate the two group of permutations (first group is mpg, ap,cs,de. second group is A,B,C)

I want to convert to another data frame format which is like this:

ID  Feature1 Feature2 value
 1     mpg      ap     0.8
 1     mpg      cs     0.7
 1     mpg      de     0.2
 1     ap       cs     0.6
 1     ap       de     0.8
 1     cs       de     1.2
 2     A        B      0.5
 2     A        C      0.3
 2     B        C      0.4

Anyone has the suggestion of how to convert this structure (R>3.6.0)?

Smile
  • 27
  • 5

2 Answers2

1

I was able to produce the output you requested using dplyr and reshape2. However, I believe you should try working on your data format (where are you getting this data from) in order to be able to write some more generally applicable code.

  data <- data.frame(ID = c(1,1,1,1,2,2,2), feature = c("mpg", "ap", "cs", "de", "A", "B", "C"), v1 = c(NA, 0.8,0.7,0.2,NA, 0.5,0.3), v2 = c(NA, NA, 0.8, 0.6, NA, NA, 0.4), v3 = c(NA, NA, NA, 1.2, NA, NA, NA))

  data %>% 
    group_by(ID) %>% 
    do({
      dfPermutations <- .
      names(dfPermutations) <- 
        c("ID", "feature1", as.character(dfPermutations$feature[1:(ncol(dfPermutations)-2)]))
      dfPermutations %>% 
        reshape2::melt(id.vars = c("ID", "feature1"), variable.name = "feature2") %>% 
        filter(!is.na(value))
    }) %>% 
    ungroup()

``
Jeroen Colin
  • 345
  • 1
  • 6
1

For base R, maybe the following code can help you:

dfs <- split(df,df$ID)

r <- Reduce(rbind,lapply(seq_along(dfs), function(k) {
  v <- dfs[[k]]; 
  f <- combn(v$"feature",2); 
  data.frame(ID=rep(names(dfs[k]),ncol(f)), Feature1=f[1,], Feature2=f[2,], value=unlist(dfs[[k]][-(1:2)][!is.na(dfs[[k]][-(1:2)])]))}))

yielding:

> r
  ID Feature1 Feature2 value
1  1      mpg       ap   0.8
2  1      mpg       cs   0.7
3  1      mpg       de   0.2
4  1       ap       cs   0.8
5  1       ap       de   0.6
6  1       cs       de   1.2
7  2        A        B   0.5
8  2        A        C   0.3
9  2        B        C   0.4

DATA

df <- data.frame(ID = c(1,1,1,1,2,2,2), feature = c("mpg", "ap", "cs", "de", "A", "B", "C"), v1 = c(NA, 0.8,0.7,0.2,NA, 0.5,0.3), v2 = c(NA, NA, 0.8, 0.6, NA, NA, 0.4), v3 = c(NA, NA, NA, 1.2, NA, NA, NA))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81