R extract unique row values in a column in a dataframe in a list

Question

I have a list of dataframes called list and it looks like this:

list[[1]]

X1 X2 X3 X4
a  1  b  c 
d  2  e  f
g  3  h  i
j  4  k  l

list[[2]]

X1 X2 X3 X4
a  1  b  c
d  2  e  f
g  2  h  i
j  3  k  l

list[[3]]

X1 X2 X3 X4    
a  1  b  c
d  2  e  f
g  3  h  i
j  4  k  l

I have been trying to use lapply to loop through the list and print out all the duplicates in column X2 of each dataframe.

I'm not able to figure this out. Would appreciate any help. Thanks.

I've tied

lapply(list, function(i) {
  if(length(unique(i[X2])) != length(i[X2])) {
    print(i[X2][duplicated(i[X2]))
  } else {
    print("No duplicates")
  }
})

Try `lapply(list, function(x) names(which(table(x$X2) > 1)))` — akrun, Jul 01 '19 at 05:37

score 3 · Accepted Answer · answered Jul 01 '19 at 05:39

We could use lapply, find out the duplicated indices in X2 column and print the unique duplicated values.

lapply(list_df, function(x) {
   inds <- duplicated(x$X2)
   if(any(inds)) unique(x$X2[inds]) else "No duplicates"
})

#[[1]]
#[1] "No duplicates"

#[[2]]
#[1] 2

#[[3]]
#[1] "No duplicates"

Using list_df instead of list since list is an internal R function.

akrun · Answer 2 · 2019-07-01T06:46:46.750

We can use table to find out the frequency of values in the column 'X2', extract the names of the output where the frequency is greater than 1

lapply(list, function(x) {
   x1 <- names(which(table(x$X2) > 1))
     if(length(x1)== 0) "No duplicates" else x1})
#[[1]]
#[1] "No duplicates"

#[[2]]
#[1] "2"

#[[3]]
#[1] "No duplicates"

Or using duplicated

lapply(list, function(x) unique(x$X2[duplicated(x$X2)|duplicated(x$X2, 
          fromLast = TRUE)]))

Or another option is to stack after extracting the column and get the index of duplicate elements with table and which

which(table(stack(setNames(lapply(list, `[[`, "X2"),
      seq_along(list)))[2:1]) > 1, arr.ind = TRUE)

Or another option is

library(tidyverse)
map(list, ~ .x %>%
              count(X2) %>%
              filter(n > 1) %>%
              pull(X2))

data

list <- list(structure(list(X1 = c("a", "d", "g", "j"), X2 = 1:4, X3 = c("b", 
"e", "h", "k"), X4 = c("c", "f", "i", "l")), class = "data.frame", row.names = c(NA, 
-4L)), structure(list(X1 = c("a", "d", "g", "j"), X2 = c(1L, 
2L, 2L, 3L), X3 = c("b", "e", "h", "k"), X4 = c("c", "f", "i", 
"l")), class = "data.frame", row.names = c(NA, -4L)), structure(list(
    X1 = c("a", "d", "g", "j"), X2 = 1:4, X3 = c("b", "e", "h", 
    "k"), X4 = c("c", "f", "i", "l")), class = "data.frame", row.names = c(NA, 
-4L)))

R extract unique row values in a column in a dataframe in a list

2 Answers2

data