-1

I have a list with several elements (up to 45) . There are some rows which are equal between some elements : I would like to gather these elements and then, remove the redundant rows in order to keep only one row from redundant rows.

This is a reproducible example (this is the file data):

               OTU0001 OTU0004 OTU0014 OTU0016 OTU0017 OTU0027 OTU0029 OTU0030
Sample_10.rare       1       1      86       1       1       1       1       1
Sample_11.rare       1      43     170       1      43     128       1      86
Sample_12.rare       1       1       1       1       1       1       1      43
Sample_13.rare     763     551    2160     128     551       1     678    1398

ncol=ncol(data)

rest<-ncol%%2
blocks<-ncol%/%2
ngroup <- rep(1:blocks, each = 2)
split <- split(1:ncol,ngroup)

combs <- expand.grid(1:length(split), 1:length(split))
combs <- t(apply(combs, 1, sort))
combs <- unique(combs)
combs <- combs[combs[,1] != combs[,2],]


cor_rho<-function(y) {
resMAT <- foreach(i = seq_len(ncol(y)),
.combine = rbind,
.multicombine = TRUE,
.inorder = FALSE,
.packages = c('data.table', 'doParallel')) %dopar% {
apply(y, 2, function(x) 1 - ((var(y[,i] - x)) / (var(y[,i]) + var(x))))}
colnames(resMAT)=rownames(resMAT)=colnames(y)
Df<-data.frame(var1=rownames(resMAT)[row(resMAT)[upper.tri(resMAT)]],
var2=colnames(resMAT)[col(resMAT)[upper.tri(resMAT)]],
corr=resMAT[upper.tri(resMAT)])
return(Df)}


res <- foreach(i = seq_len(nrow(combs))) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

res #I get my list

[[5]]
     var1    var2      corr
1 OTU0014 OTU0016 0.1214562
2 OTU0014 OTU0029 0.5875550
3 OTU0016 OTU0029 0.3624304
4 OTU0014 OTU0030 0.9136386
5 OTU0016 OTU0030 0.1853840
6 OTU0029 OTU0030 0.7980875

[[6]]
     var1    var2        corr
1 OTU0017 OTU0027 -0.11770325
2 OTU0017 OTU0029  0.97129390
3 OTU0027 OTU0029 -0.12081013
4 OTU0017 OTU0030  0.68441352
5 OTU0027 OTU0030 -0.05400953
6 OTU0029 OTU0030  0.79808749

Thanks

VPailler
  • 185
  • 1
  • 10
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Share data in a way that we can copy/paste it into R. Avoid `...` since that's not valid data. – MrFlick May 07 '19 at 14:33
  • Have a look at `unique` and `duplicated` – Clemsang May 07 '19 at 14:41

2 Answers2

1

Assuming L, shown reproducibly in the Note at the end, convert it to a single data frame with an id column, eliminate duplicates and split it back:

library(dplyr)
library(purrr)

L %>%
  map_dfr(identity, .id = "id") %>%
  filter(!duplicated(.[-1])) %>%
  { split(.[-1], .$id) }

Note

L <-
list(structure(list(V1 = 186:190, V2 = structure(1:5, .Label = c("OTU0726", 
"OTU0731", "OTU0733", "OTU0735", "OTU0737"), class = "factor"), 
    V3 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "OTU0748", class = "factor"), 
    V4 = c(0.514903312, 0.22825604, 0.491201489, 0.897293588, 
    -0.216130167)), class = "data.frame", row.names = c(NA, -5L
)), structure(list(V1 = 186:190, V2 = structure(1:5, .Label = c("OTU0726", 
"OTU0731", "OTU0733", "OTU0735", "OTU0737"), class = "factor"), 
    V3 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "OTU0748", class = "factor"), 
    V4 = c(0.514903312, 0.22825604, 0.491201489, 0.897293588, 
    -0.216130167)), class = "data.frame", row.names = c(NA, -5L
)))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks, it will be very helpful. However, I get different values in my 3th column (I edited my post). For example, between `6 OTU0029 OTU0030 0.7980875` and `6 OTU0029 OTU0030 0.79808749` I would like to round these values in order to get the same "real" values . Then, I will keep only one unique row. – VPailler May 07 '19 at 14:57
  • Round V4 by using this line instead of the filter line shown: `filter(! duplicated(mutate(.[-1], V4 = round(V4, 2))))` – G. Grothendieck May 07 '19 at 15:23
0

Here's a base R solution ... you don't give a reproducible example so I am assuming that your list of data.frames has the same column names.

a_list <- list(data.frame(A = c(1,2,3), B = c(2,5,6), C= c(3,8,9)),
               data.frame(A = c(1,11,12), B = c(2,2,3), C = c(3,14,15)))
## prepare a list of data.frames

b_data <- unique(do.call(rbind, a_list))
## use the unique function to strip out the duplicated rows after you've used the
## do.call function to bind the list of data.frames into one big data.frame 

If you compare the list and the b_data data.frame you'll find that row four has been elided since it is a copy of row 1.

DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29