0

I have the following somewhat complicated problem. Consider the following two data frames

df
  ID cat month_1 month_2
1  1   A       3       1
2  1   B       2       2
3  1   C       3       4
4  2   A       3       6
5  3   D       5       2
6  3   B       2       9
> df2
  ID month_number
1  1      month_1
2  1      month_2
3  2      month_1
4  2      month_2
5  3      month_1
6  3      month_2

Now, I want to do the following: I want to create a new column in df2, displaying the sum of an related to the ID, in the specific month. I.e., in the first df it can be seen that the person with ID 1 has a total sum of 3+2+3 = 8 in month 1 and 7 in month 2. I want to mutate this, respectively to the correct month. The desired output would be (Now computed by hands)

desired_df
  ID month_number grand_total
1  1      month_1           8
2  1      month_2           7
3  2      month_1           3
4  2      month_2           6
5  3      month_1           7
6  3      month_2          11

I want to show my code, but I do not have any. Could anyone help me?

Maurice
  • 139
  • 1
  • 7
  • Merge/join by ID and summarise? Actually looks like `melt`ing df might be sufficient. – NelsonGon Sep 26 '19 at 15:21
  • But how does the computer know to which month he should map it, as in ```df``` the month number is displayed as a column title – Maurice Sep 26 '19 at 15:24
  • 1
    Try this with `dplyr` and `tidyr`: `df %>% tidyr::pivot_longer(contains("month"),"key",values_to = "val") %>% group_by(key,cat) %>% summarise(Sum =sum(val))` – NelsonGon Sep 26 '19 at 15:25
  • Alright! I'll be on it right know – Maurice Sep 26 '19 at 15:26
  • 1
    ```library(dplyr); library(tidyr); df1 %>% select(-cat) %>% group_by(ID) %>% summarise_all(sum) %>% gather(month_number, grand_total, -ID)``` – M-- Sep 26 '19 at 15:38
  • 1
    Or ```df1 %>% gather(month_number, grand_total, -c(ID,cat)) %>% group_by(ID, month_number) %>% summarise(grand_total = sum(grand_total))``` – M-- Sep 26 '19 at 15:40

1 Answers1

1

This is pretty easy to do once you get the data into "tidy" format, which we can do using gather (at some point this will be replaced by pivot_longer as NelsonGon commented above, but that's not in the CRAN release yet). I've used contains("month") to select all of the columns that contain that string, but if you have actual month names as columns, you could do Jan:Dec or literally type them out to select them instead. At that point, group_by your identifier column(s) and then use summarize to compute your sum within each group.

df %>% 
  gather(key = "month_number", value = "count", contains("month")) %>% 
  group_by(ID, month_number) %>% 
  summarize(grand_total = sum(count))

There's no reason to join in this instance since gather does what you need.

GenesRus
  • 1,057
  • 6
  • 16