-1

I have a dataset of 19000. The lenght of the unique patient ID's is 15000. I want to have a subset of these unique ID's, but with the other variables as in the orginal dataset

patnr      age    and 25 other variables
1          20
2          21
3          16
4           5
19000

How can i do this? Now i can only see how many unique patient ID's are in this database with this command:

length(unique(data$patnr))
zx8754
  • 52,746
  • 12
  • 114
  • 209
karin
  • 1
  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Jun 01 '16 at 11:42
  • If `patnr` is duplicated, which one do you want to keep in the results? – zx8754 Jun 01 '16 at 11:44

1 Answers1

0

Let's say your data.frame is called, df. You can use unique as follows to select the first instance of a patient ID appearing:

dfUnique <- df[unique(df$patn), ]

Note that this will drop roughly 4,000 rows and you would lose that information if the other variables are different for the same patient in the second observation.

lmo
  • 37,904
  • 9
  • 56
  • 69