1

I have a data frame like this:

data.frame(name = c("a", "b", "c", "d"),
           date = c("2018-09", "2018-10", "2018-9", "2018-11"))

and I want to combine any of the names that share a date with another into a single row separated by a comma with the second column showing the date they share

data.frame(name = c("a,c", "b", "d"),
           date = c("2018-09", "2018-10", "2018-11"))
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
pete5275
  • 11
  • 2
  • 1
    @RichScriven Provided it's not a typo in the sample data, this is not quite a dupe, since OP needs to take care of different date formats prior to aggregating (e.g. `"2018-09"` vs. `"2018-9"`). – Maurits Evers Sep 25 '18 at 22:02

3 Answers3

1

I assume the different date formats from your sample data are real, i.e. you have data where "2018-09" and "2018-9" correspond to the same month+year.

In that case you can do the following

df %>%
    mutate(date = as.Date(paste0(date, "-01"), "%Y-%m-%d")) %>%
    group_by(date) %>%
    summarise(name = toString(name)) %>%
    mutate(date = format(date, format = "%Y-%m"))
#    # A tibble: 3 x 2
#  date    name
#  <chr>   <chr>
#1 2018-09 a, c
#2 2018-10 b
#3 2018-11 d

Sample data

df <- data.frame(name = c("a", "b", "c", "d"),
       date = c("2018-09", "2018-10", "2018-9", "2018-11"))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
0

You can do this in dplyr with group_by and summarize.

Since you're working with dates, it's important that you get them into a standardized format (which they aren't in your example) so that identical dates are considered identical.

The lubridate package is, in my opinion, the easiest way to do this. In the below example, we parse the date variable as a 4-digit year Y and decimal month m, then group by identical dates as before:

df2 <- data.frame(name = c("a", "b", "c", "d"),
           date = c("2018-09", "2018-10", "2018-9", "2018-11"))
df2 %>%
    mutate(date = lubridate::parse_date_time(date, 'Ym')) %>%
    group_by(date) %>%
    summarise(name = paste0(name, collapse = ','))

# A tibble: 3 x 2
  date                name 
  <dttm>              <chr>
1 2018-09-01 00:00:00 a,c  
2 2018-10-01 00:00:00 b    
3 2018-11-01 00:00:00 d    
divibisan
  • 11,659
  • 11
  • 40
  • 58
0

Using R base aggregate and df from @divibisan answer:

> aggregate(name~date, data=df,  paste, collapse = ",")
     date name
1 2018-09  a,c
2 2018-10    b
3 2018-11    d
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138