0

I have a list of dataframes called list and it looks like this:

list[[1]]

X1 X2 X3 X4
a  1  b  c 
d  2  e  f
g  3  h  i
j  4  k  l

list[[2]]

X1 X2 X3 X4
a  1  b  c
d  2  e  f
g  2  h  i
j  3  k  l

list[[3]]

X1 X2 X3 X4    
a  1  b  c
d  2  e  f
g  3  h  i
j  4  k  l

I have been trying to use lapply to loop through the list and print out all the duplicates in column X2 of each dataframe.

I'm not able to figure this out. Would appreciate any help. Thanks.

I've tied

lapply(list, function(i) {
  if(length(unique(i[X2])) != length(i[X2])) {
    print(i[X2][duplicated(i[X2]))
  } else {
    print("No duplicates")
  }
})
tora0515
  • 2,479
  • 12
  • 33
  • 40

2 Answers2

3

We could use lapply, find out the duplicated indices in X2 column and print the unique duplicated values.

lapply(list_df, function(x) {
   inds <- duplicated(x$X2)
   if(any(inds)) unique(x$X2[inds]) else "No duplicates"
})

#[[1]]
#[1] "No duplicates"

#[[2]]
#[1] 2

#[[3]]
#[1] "No duplicates"

Using list_df instead of list since list is an internal R function.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use table to find out the frequency of values in the column 'X2', extract the names of the output where the frequency is greater than 1

lapply(list, function(x) {
   x1 <- names(which(table(x$X2) > 1))
     if(length(x1)== 0) "No duplicates" else x1})
#[[1]]
#[1] "No duplicates"

#[[2]]
#[1] "2"

#[[3]]
#[1] "No duplicates"

Or using duplicated

lapply(list, function(x) unique(x$X2[duplicated(x$X2)|duplicated(x$X2, 
          fromLast = TRUE)]))

Or another option is to stack after extracting the column and get the index of duplicate elements with table and which

which(table(stack(setNames(lapply(list, `[[`, "X2"),
      seq_along(list)))[2:1]) > 1, arr.ind = TRUE)

Or another option is

library(tidyverse)
map(list, ~ .x %>%
              count(X2) %>%
              filter(n > 1) %>%
              pull(X2))

data

list <- list(structure(list(X1 = c("a", "d", "g", "j"), X2 = 1:4, X3 = c("b", 
"e", "h", "k"), X4 = c("c", "f", "i", "l")), class = "data.frame", row.names = c(NA, 
-4L)), structure(list(X1 = c("a", "d", "g", "j"), X2 = c(1L, 
2L, 2L, 3L), X3 = c("b", "e", "h", "k"), X4 = c("c", "f", "i", 
"l")), class = "data.frame", row.names = c(NA, -4L)), structure(list(
    X1 = c("a", "d", "g", "j"), X2 = 1:4, X3 = c("b", "e", "h", 
    "k"), X4 = c("c", "f", "i", "l")), class = "data.frame", row.names = c(NA, 
-4L)))
akrun
  • 874,273
  • 37
  • 540
  • 662