Data Management in R - tidying data after aggregation

Question

I am having difficulty dealing with this step of data management in R while trying to aggregate data at the monthly level.

I have two examples, one essentially represents where I am stuck at now in my process of cleaning and aggregating, and the second represents what I would like it to look like.

What it looks like now:

month <- c("January", "January", "February", "March", "April", "April", 
"May", "June", "July")
year <- c(2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014)
count1 <- c(3, 0, 1, 2, 0, 8, 1, 1, 1)
count2 <- c(0, 2, 1, 4, 6, 0, 1, 1, 1)
count3 <- c(1, 1, 1, 1, 1, 1, 0, 0, 1)

df <- data.frame(month, year, count1, count2, count3)

What I would like it to look like:

month2 <- c("January", "February", "March", "April", "May", "June", "July")
year2 <- c(2014, 2014, 2014, 2014, 2014, 2014, 2014)
count1a <- c(3, 1, 2, 8, 1, 1, 1)
count2a <- c(2, 1, 4, 6, 1, 1, 1)
count3a <- c(1, 1, 1, 1, 0, 0, 1)

df2 <- data.frame(month2, year2, count1a, count2a, count3a)

As you will notice, I am having some months counted twice and their observations are not on the same row as the other observations that I am using for the same month.

It is somewhat similar. It was not clear how to describe this problem before seeing the answer. — John Stud, Oct 31 '17 at 02:49

akrun · Accepted Answer · 2017-10-31T02:33:42.353

1

After grouping by 'month' and 'year', get the max

library(dplyr)
df %>% 
   group_by(month, year) %>%
   summarise_all(max)
# A tibble: 7 x 5
# Groups:   month [?]
#     month  year count1 count2 count3
#    <fctr> <dbl>  <dbl>  <dbl>  <dbl>
#1    April  2014      8      6      1
#2 February  2014      1      1      1
#3  January  2014      3      2      1
#4     July  2014      1      1      1
#5     June  2014      1      1      0
#6    March  2014      2      4      1
#7      May  2014      1      1      0

If we need to keep the same order, then

df %>% 
   group_by(month = factor(month, levels = unique(month)), year) %>% 
   summarise_all(max)
   #or
   #summarise_all(funs(.[order(-.)][1]))

edited Oct 31 '17 at 02:33

answered Oct 31 '17 at 02:28

akrun

874,273
37
540
662

1

Wow, I did not think of that. This is incredibly HELPFUL! – John Stud Oct 31 '17 at 02:31
1

Yes definitely the order is important, I appreciate it! – John Stud Oct 31 '17 at 02:41

Data Management in R - tidying data after aggregation

1 Answers1