-3

I have an issue in selecting duplicate rows in R. A data fame has 14 columns and 1 million rows. I have to do row comparison i.e finding out identical rows, would be duplicate. I want to get the duplicate row by this method. My data frame is like Data frame sample

Last two rows were identical, so need to mark it as flag value 1. I don't know how to start with this.

I have tried these codes,

df <- unique(data[,1:97]) //this method gives me unique set not number of duplicates.
dim(data[duplicated(data),])[1]  // this method gives me the number of duplicates but not ids.

I need to know the duplicate ids.

my intension is to check each row and written total number of duplicate rows or the line number.

Sharmi
  • 51
  • 2
  • 9
  • @dww I have already referred the question. It can remove particular row and column values, but I need to do it for entire row. I have linked in my data sample. – Sharmi Aug 15 '18 at 22:15

1 Answers1

-1

Look into the duplicated() function. It can be used to remove the duplicated rows or inversely keep them as well

SmitM
  • 1,366
  • 1
  • 8
  • 14
  • I have applied duplicated() function on my set but it says false, even duplicate row exists. – Sharmi Aug 15 '18 at 22:02