R - Combine names with duplicate dates into single row

Question

I have a data frame like this:

data.frame(name = c("a", "b", "c", "d"),
           date = c("2018-09", "2018-10", "2018-9", "2018-11"))

and I want to combine any of the names that share a date with another into a single row separated by a comma with the second column showing the date they share

data.frame(name = c("a,c", "b", "d"),
           date = c("2018-09", "2018-10", "2018-11"))

@RichScriven Provided it's not a typo in the sample data, this is not quite a dupe, since OP needs to take care of different date formats prior to aggregating (e.g. `"2018-09"` vs. `"2018-9"`). — Maurits Evers, Sep 25 '18 at 22:02

score 1 · Answer 1 · answered Sep 25 '18 at 21:59

I assume the different date formats from your sample data are real, i.e. you have data where "2018-09" and "2018-9" correspond to the same month+year.

In that case you can do the following

df %>%
    mutate(date = as.Date(paste0(date, "-01"), "%Y-%m-%d")) %>%
    group_by(date) %>%
    summarise(name = toString(name)) %>%
    mutate(date = format(date, format = "%Y-%m"))
#    # A tibble: 3 x 2
#  date    name
#  <chr>   <chr>
#1 2018-09 a, c
#2 2018-10 b
#3 2018-11 d

Sample data

df <- data.frame(name = c("a", "b", "c", "d"),
       date = c("2018-09", "2018-10", "2018-9", "2018-11"))

divibisan · Answer 2 · 2018-09-26T15:29:16.687

You can do this in dplyr with group_by and summarize.

Since you're working with dates, it's important that you get them into a standardized format (which they aren't in your example) so that identical dates are considered identical.

The lubridate package is, in my opinion, the easiest way to do this. In the below example, we parse the date variable as a 4-digit year Y and decimal month m, then group by identical dates as before:

df2 <- data.frame(name = c("a", "b", "c", "d"),
           date = c("2018-09", "2018-10", "2018-9", "2018-11"))
df2 %>%
    mutate(date = lubridate::parse_date_time(date, 'Ym')) %>%
    group_by(date) %>%
    summarise(name = paste0(name, collapse = ','))

# A tibble: 3 x 2
  date                name 
  <dttm>              <chr>
1 2018-09-01 00:00:00 a,c  
2 2018-10-01 00:00:00 b    
3 2018-11-01 00:00:00 d

score 0 · Answer 3 · answered Sep 25 '18 at 21:59

0

Using R base aggregate and df from @divibisan answer:

> aggregate(name~date, data=df,  paste, collapse = ",")
     date name
1 2018-09  a,c
2 2018-10    b
3 2018-11    d

answered Sep 25 '18 at 21:59

Jilber Urbina

58,147
10
114
138

R - Combine names with duplicate dates into single row

3 Answers3

Sample data