0

I have a data set with 399 rows and 7 columns. Each row is made by some NA and some values. What I want to do is to create a new data frame with all the possible combinations of 3 elements for each row. Let's say that row one has 4 elements so I want that the new data frame, on row one, has 4 columns with the standard deviations of all the combinations of 3 elements of row 1(of the original Data Set). This is the head of the original Data Set:

       V1      V2         V3        V4       V5         V6        V7    
1 0.0853146 0.0809561 0.1350686     NA       NA         NA        NA
2 0.0788104 0.0964276 0.1222457 0.0853146    NA         NA        NA
3 0.1086917 0.0818920 0.0479148 0.0981603 0.0788104     NA        NA
4 0.0811772 0.1088340 0.1823510 0.0809561 0.0964276 0.1086917     NA
5 0.1015970 0.1089944 0.1243186 0.0858065 0.0842896 0.0818920 0.0811772
6 0.0639869 0.1496792 0.1704337 0.1088340 0.1015970     NA        NA 
7 0.0619823 0.0962283 0.1089944 0.0639869    NA         NA        NA

The problem is that I can't remove the NAs so that I get the wrong number of combinations and therefore the wrong number of standard deviations. Here what I come up with, but it does not work.

mydf<-as.matrix(df, na.rm=TRUE)
row<-apply(mydf, na.rm=TRUE, MARGIN = 1, FUN =combn, m=3, simplify = TRUE)
row<-as.matrix((row))
stdeviation<-apply(row,MARGIN = 1, FUN=sd,na.rm=TRUE)
stdeviation<-as.data.frame(stdeviation)

The table of the combinations looks like this for row 2:

V1                       V2                V3 
0.0788104313282292 0.0964276223058486 0.122245745410429 
0.0788104313282292 0.0964276223058486 0.0853146853146852 
0.0788104313282292 0.122245745410429 0.0853146853146852 
0.0964276223058486 0.122245745410429 0.0853146853146852

The output for the second column, which I managed to do, looks like

                V1            V2        V3        V4
stdeviation 0.02184631 0.008908499 0.02342661 0.01894719
  • 5
    Please provide some sample data to have a reproducible example (ideally with dput) – Robin Gertenbach Aug 22 '17 at 08:15
  • here is how I did for the 2nd row of the dataframe: row<-(df[2,]) #To remove NAs row<-row[ , colSums(is.na(row)) == 0] #then I make the combinations row<-combn(row,m=3,simplify = FALSE) row <- rbindlist(row) row<- as.matrix(row) stdeviation<-NULL #then I calculate the sd of each combinations stdeviation<-apply(row,MARGIN = 1, FUN=sd) – Simone Castelli Aug 22 '17 at 08:24
  • The table of the combinations looks like this for row 2: V1 V2 V3 0.0788104313282292 0.0964276223058486 0.122245745410429 0.0788104313282292 0.0964276223058486 0.0853146853146852 0.0788104313282292 0.122245745410429 0.0853146853146852 0.0964276223058486 0.122245745410429 0.0853146853146852 – Simone Castelli Aug 22 '17 at 08:35
  • Please have a look https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to see how to create a reproducible example. – elevendollar Aug 22 '17 at 08:45
  • Sorry, I'm new on asking here... I think I made the question more understandable.. hopefully – Simone Castelli Aug 22 '17 at 09:50
  • Do you want a separate data.frame for the total number of possible combinations of each row or you want all the possible combinations of **all the rows** in the same data.frame with 3 columns? Also, it would be great if you could share the output of `dput(dataset)` in the question. – tushaR Aug 22 '17 at 09:59
  • for each row --> get the possible combinations (made by 3 elements each) of all the values of that row. I don't care about columns. – Simone Castelli Aug 22 '17 at 10:20
  • Here is the dput: structure(c(0.0218463094061084, 0.00890849881135809, 0.0234266053492431, 0.0189471899807873), .Dim = c(1L, 4L), .Dimnames = list("stdeviation", NULL)) – Simone Castelli Aug 22 '17 at 10:22

0 Answers0