0

I have the following csv file as dataframe called key: https://www.dropbox.com/s/vy7bxlh2oyvh141/key.csv?dl=0

When I run:

dup <- key[which(duplicated(key$Genotype)), ]

I get a dataframe with 100 rows, most of actually appear unique:

> head(dup)
       Pot  Genotype
193 142698 PI-177384
194 142700 PI-178900
195 142702 PI-179275
196 142704 PI-179276
197 142706 PI-179277
198 142712 PI-179690

Does anyone know the reason for this?

Ben Norris
  • 5,639
  • 2
  • 6
  • 15
shbrainard
  • 377
  • 2
  • 9
  • 1
    `duplicated` only returns TRUE the second time a value appears. For example `duplicated(c(1,2,2))` returns FALSE, FALSE, TRUE. The first 2 hasn't been seen before but the second is a duplicate. – MrFlick Mar 04 '20 at 20:06
  • Possible duplicate (or at least very related): https://stackoverflow.com/questions/12495345/find-indices-of-duplicated-rows – MrFlick Mar 04 '20 at 20:09

1 Answers1

1

If you want to create a df of duplicated rows, you'll have to alter the code to include a !

This should work:

dup <- key[which(!duplicated(key$Genotype)), ]
Matt
  • 7,255
  • 2
  • 12
  • 34
  • Awesome! Feel free to click the green checkmark to accept the answer if it solved your question. – Matt Mar 04 '20 at 20:18