Select subset of unique patient ID

Question

I have a dataset of 19000. The lenght of the unique patient ID's is 15000. I want to have a subset of these unique ID's, but with the other variables as in the orginal dataset

patnr      age    and 25 other variables
1          20
2          21
3          16
4           5
19000

How can i do this? Now i can only see how many unique patient ID's are in this database with this command:

length(unique(data$patnr))

Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, Jun 01 '16 at 11:42
If `patnr` is duplicated, which one do you want to keep in the results? — zx8754, Jun 01 '16 at 11:44

score 0 · Answer 1 · answered Jun 01 '16 at 11:43

Let's say your data.frame is called, df. You can use unique as follows to select the first instance of a patient ID appearing:

dfUnique <- df[unique(df$patn), ]

Note that this will drop roughly 4,000 rows and you would lose that information if the other variables are different for the same patient in the second observation.

Select subset of unique patient ID

1 Answers1