0

I have a data frame which each row has 7 numbers, I would like to do a for or while loop to tell me that when a row is the same as a row.

data frame:

   1st 2nd 3rd 4th 5th 6th 7th
1    5  32  34  38  39  49   8
2   10  20  21  33  40  44  34
3   10  20  26  28  35  48  13
4   14  19  23  36  44  46   7
5    9  24  25  27  36  38  41
6    7  13  14  20  29  32  28
7   11  22  24  28  29  38  20
8    1  11  29  33  36  44  37
9    9  12  25  31  43  44   5
10   1   5   6  31  39  46  44
11   14  19  23  36  44  46   7

desired output:

 4   14  19  23  36  44  46   7
11   14  19  23  36  44  46   7

I tried the code but error: lapply(df, function(i) all(df[i,] == df[1:nrow(df),]))

but it is not correct. please advice, thanks.

Peter Chung
  • 1,010
  • 1
  • 13
  • 31
  • Do you need `lapply(seq_len(nrow(df)), function(i) lapply(seq_len(nrow(df)), function(j) all(df[i,] == df[j,])))` or using `outer(seq_len(nrow(df)), seq_len(nrow(df)), FUN = Vectorize(function(i, j) all(df[i,] == df[j,])))` – akrun Jul 15 '18 at 14:38
  • possible dupe: https://stackoverflow.com/questions/12495345/find-indices-of-duplicated-rows – YOLO Jul 15 '18 at 14:48
  • 1
    Try `lapply(seq_len(nrow(df)), function(i) {i1 <- rowSums(df[i,][col(df)] == df)== ncol(df); if(sum(i1) >1) df[i1,]})` – akrun Jul 15 '18 at 14:52

2 Answers2

4

A base R option would be

unique(Filter(Negate(is.null), lapply(seq_len(nrow(df)), function(i) {
       i1 <- rowSums(df[i,][col(df)] == df)== ncol(df)
       if(sum(i1) >1) df[i1,]}) ))
[1]]
#    1st  2nd  3rd  4th  5th  6th  7th
#4    14   19   23   36   44   46    7
#11   14   19   23   36   44   46    7

If we are only interested in duplicate rows

df[duplicated(df)|duplicated(df, fromLast = TRUE),]
#    1st  2nd  3rd   4th  5th  6th 7th
#4    14   19   23   36   44   46    7
#11   14   19   23   36   44   46    7
akrun
  • 874,273
  • 37
  • 540
  • 662
3

An option using dplyr::group_by_all() can be very handy as:

library(dplyr)

df %>% group_by_all() %>%
  filter(n()>1)  # n()>1 will make sure to return only rows having duplicates

# # A tibble: 2 x 7
# # Groups: X1st, X2nd, X3rd, X4th, X5th, X6th, X7th [1]
#    X1st  X2nd  X3rd  X4th  X5th  X6th  X7th
#   <int> <int> <int> <int> <int> <int> <int>
# 1    14    19    23    36    44    46     7
# 2    14    19    23    36    44    46     7

Data:

df <- read.table(text = 
"1st 2nd 3rd 4th 5th 6th 7th
1    5  32  34  38  39  49   8
2   10  20  21  33  40  44  34
3   10  20  26  28  35  48  13
4   14  19  23  36  44  46   7
5    9  24  25  27  36  38  41
6    7  13  14  20  29  32  28
7   11  22  24  28  29  38  20
8    1  11  29  33  36  44  37
9    9  12  25  31  43  44   5
10   1   5   6  31  39  46  44
11   14  19  23  36  44  46   7",
header = TRUE)
MKR
  • 19,739
  • 4
  • 23
  • 33