0

I am fairly new to R and happen to be stuck with the following problem: I want to fill in an empty data frame based on multiple criteria and sum the variable of interest. The data frames look like this:

The empty matrix, that I want to fill:

dat <- data.frame(year = 2021, 
                  month = 1:12,
                  green = "N/A",
                  yellow = "N/A",
                  red = "N/A",
                  blue = "N/A",
                  purple = "N/A")

dat

   year month green yellow red blue purple
1  2021     1   N/A    N/A N/A  N/A    N/A
2  2021     2   N/A    N/A N/A  N/A    N/A
3  2021     3   N/A    N/A N/A  N/A    N/A
4  2021     4   N/A    N/A N/A  N/A    N/A
5  2021     5   N/A    N/A N/A  N/A    N/A
6  2021     6   N/A    N/A N/A  N/A    N/A
7  2021     7   N/A    N/A N/A  N/A    N/A
8  2021     8   N/A    N/A N/A  N/A    N/A
9  2021     9   N/A    N/A N/A  N/A    N/A
10 2021    10   N/A    N/A N/A  N/A    N/A
11 2021    11   N/A    N/A N/A  N/A    N/A
12 2021    12   N/A    N/A N/A  N/A    N/A

And data frame with the original data:

dat_org <- data.frame(year = c(2020,2020,2020,2020,2020,2021,2021,2021,2021,2021,2021,2021,2021), 
                      month = c(1,4,6,9,12,1,1,1,5,5,5,10,10),
                      height = c(23,35,76,87,55,43,21,12,45,67,78,23,34),
                      colour = c("green", "green", "red", "yellow", "purple", "green", "blue", "blue", "blue", "red", "yellow", "yellow", "red"))

dat_org

   year month height colour
1  2020     1     23  green
2  2020     4     35  green
3  2020     6     76    red
4  2020     9     87 yellow
5  2020    12     55 purple
6  2021     1     43  green
7  2021     1     21   blue
8  2021     1     12   blue
9  2021     5     45   blue
10 2021     5     67    red
11 2021     5     78 yellow
12 2021    10     23 yellow
13 2021    10     34    red

Now, I want to transfer the "height" values in dat_org to dat, based on the year, month and colour. If no values in dat_org based on these criteria exist, I want to replace N/A with 0. If there are several rows, that meet the conditions, I want to sum up the variable (as it happend for year = 2021, month = 1 and colour ="blue").

The result would look like this:

dat

   year month green yellow red blue purple
1  2021     1    43      0   0   33      0
2  2021     2     0      0   0    0      0
3  2021     3     0      0   0    0      0
4  2021     4     0      0   0    0      0
5  2021     5     0     78  67   45      0
6  2021     6     0      0   0    0      0
7  2021     7     0      0   0    0      0
8  2021     8     0      0   0    0      0
9  2021     9     0      0   0    0      0
10 2021    10     0     23  34    0      0
11 2021    11     0      0   0    0      0
12 2021    12     0      0   0    0      0

I was first thinking to use "aggregate" or "group by", but then it only lists the existing months of df_org (here 1,5,10), but I also want to fill in N/A in df with a zero, if the conditions are not fulfilled.

The logic is similar to sumifs in Excel, but I am not sure how to apply this in R.

halfer
  • 19,824
  • 17
  • 99
  • 186
Josie
  • 1
  • 2
  • Welcome to SO! It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including the code you have tried and a snippet of your data or some fake data. Please do not post an image of code/data/errors [for these reasons](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question/285557#285557). Just include the code, console output, or data (e.g., dput(head(x)) or data.frame(...)) directly. – stefan Aug 20 '22 at 08:15
  • This said: Using aggregate is the right way to do, e.g. using `dplyr`: `df %>% group_by(year, month, color) %>% summarise(x = sum(x), .groups = "drop")`. And if necessary you could use `...%>%pivot_wider(names_from = color, values_from = x, values_fill = 0)` to convert the result into wide format with the colors as columns and replace resulting missings by 0. – stefan Aug 20 '22 at 08:18

1 Answers1

0

This is basically the code from my comment which first aggregates the data and after that reshape to wide format. However, I added two pieces. First, I use tidyr::complete to add all months from 1 to 12 and finally I filter for year 2021

dat_org <- data.frame(year = c(2020,2020,2020,2020,2020,2021,2021,2021,2021,2021,2021,2021,2021), 
                      month = c(1,4,6,9,12,1,1,1,5,5,5,10,10),
                      height = c(23,35,76,87,55,43,21,12,45,67,78,23,34),
                      colour = c("green", "green", "red", "yellow", "purple", "green", "blue", "blue", "blue", "red", "yellow", "yellow", "red"))

library(dplyr)
library(tidyr)

colors <- unique(dat_org$colour)

dat_org %>% 
  group_by(year, month, colour) %>% 
  summarise(height = sum(height), .groups = "drop") %>% 
  complete(year, month = 1:12, colour, fill = list(height = 0))  %>% 
  pivot_wider(names_from = colour, values_from = height, values_fill = 0) %>%
  filter(year == 2021)
#> # A tibble: 12 × 7
#>     year month  blue green purple   red yellow
#>    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>
#>  1  2021     1    33    43      0     0      0
#>  2  2021     2     0     0      0     0      0
#>  3  2021     3     0     0      0     0      0
#>  4  2021     4     0     0      0     0      0
#>  5  2021     5    45     0      0    67     78
#>  6  2021     6     0     0      0     0      0
#>  7  2021     7     0     0      0     0      0
#>  8  2021     8     0     0      0     0      0
#>  9  2021     9     0     0      0     0      0
#> 10  2021    10     0     0      0    34     23
#> 11  2021    11     0     0      0     0      0
#> 12  2021    12     0     0      0     0      0
stefan
  • 90,330
  • 6
  • 25
  • 51