0

I have a database with 100 rows. Here is a small sample of data

df<-read.table (text=" Id   Colour  Class   val Group
'P' 'NA'    'NA'    'NA'    '1'
'Q' 'NA'    'NA'    'NA'    '2'
'12'    'Red'   'A' '12'    '3'
'P' 'NA'    'NA'    'NA'    '1'
'Q' 'NA'    'NA'    'NA'    '2'
'Z' 'Yellow'    'M' '9' '20'
'P' 'Blue'  'M' '30'    '50'


    ", header=TRUE)

As you can see rows P and Q are repeated. I want to remove rows P and Q at the bottom to get this outcome

   Id Colour Class val Group
    1  P   <NA>  <NA>  NA     1
    2  Q   <NA>  <NA>  NA     2
    3 12    Red     A  12     3
    6  Z Yellow     M   9    20
    7  P   Blue     M  30    50

Using the following codes, I can get the outcome. However, this does not help me as the Id names are sometimes different and it is also tedious to check the rows of interest to remove. Can we do better?

df[-c(4,5), ]
user330
  • 1,256
  • 1
  • 7
  • 12

1 Answers1

1

You can use unique, which is in the base:

unique(df)

This will reduce the two 'Q' rows to one, and the three 'P' rows to two, as you show you want in your output.

pwilcox
  • 5,542
  • 1
  • 19
  • 31