Eliminate duplicate dates in dataframe in R

Question

I have a data frame as given below. How can I get a new data frame that eliminates duplication of dates: 6/15/2018 and 6/28/2018 and have a data frame with only unique values

 Date        Hrs
6/14/2018   364.8
6/15/2018   372.6
6/15/2018   381.9
6/21/2018   383.3
6/22/2018   394.5
6/25/2018   411
6/28/2018   423.9
6/28/2018   424.9

How do you tell which of the two values for 6/15 you want to keep? — MrFlick, Jul 12 '18 at 16:06
Which Hrs record you wanna remove where there's a duplicate? And what have you tried so far? — sm925, Jul 12 '18 at 16:07
Possible duplicate of [Remove duplicated rows](https://stackoverflow.com/questions/13967063/remove-duplicated-rows) — gos, Jul 12 '18 at 16:18

score 2 · Answer 1 · answered Jul 12 '18 at 16:17

2

Assuming you just want to keep the first row of two duplicates:

df <- df[!duplicated(df[c('date')]),]

df

##     date   hrs
##6/14/2018 364.8
##6/15/2018 372.6
##6/21/2018 383.3
##6/22/2018 394.5
##6/25/2018 411.0
##6/28/2018 423.9

answered Jul 12 '18 at 16:17

gos

474
4
18

dear gos, can you help me if I want to keep the second row of two duplicates – Pan Apr 12 '21 at 00:29

score 0 · Accepted Answer · answered Jul 12 '18 at 16:55

1) zoo You can create a zoo series without duplicates by using read.zoo and specifying an aggregate function like this. In the example we assumed you want the last of any duplicates but we can use mean, median, function(x) head(x, 1) or other functions for other aggregates.

library(zoo) 
z <- read.zoo(DF, format = "%m/%d/%Y", aggregate = function(x) tail(x, 1))

Now plot(z), lattice::xyplot(z) or ggplot2::autoplot(z) will plot it, fortify.zoo(z) will convert it to a data frame, etc.

2) base We can use aggregate in base like this:

DF2 <- transform(DF, Date = as.Date(Date, "%m/%d/%Y"))
aggregate(Hrs ~ Date, DF2, function(x) tail(x, 1))

or we can use any of the aggregate functions mentioned in (1).

## Note

Lines <- " Date        Hrs
6/14/2018   364.8
6/15/2018   372.6
6/15/2018   381.9
6/21/2018   383.3
6/22/2018   394.5
6/25/2018   411
6/28/2018   423.9
6/28/2018   424.9"
DF <- read.table(text = Lines, header = TRUE)

In the data frame the dates are in Y-m-d format and I get the following error > z <- read.zoo(nonduplicate, format = "%Y/%m/%d", aggregate = function(x) tail(x, 1)) Error in read.zoo(nonduplicate, format = "%Y/%m/%d", aggregate = function(x) tail(x, : index has bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 — Alli, Jul 12 '18 at 17:50
If you paste the code in the Note at the end into R and then paste the code in the answer then it works. If I run this which creates DF3 with yyyy/mm/dd dates it also works. `DF3 <- transform(DF, Date = format(as.Date(Date, "%m/%d/%Y"), "%Y/%m/%d")); z <- read.zoo(DF3, format = "%Y/%m/%d", aggregate = function(x) tail(x, 1))` You will need to provide a reproducible example to say more. — G. Grothendieck, Jul 12 '18 at 19:07

Eliminate duplicate dates in dataframe in R

2 Answers2