R Create new variable based on group and multiple column condition

Question

Could someone please help me insert the group_by function (or any function that would do the trick)into this:

df$buy <-ifelse(
 ( 
 (df$month %in% c(201909, 201910,  201911, 201912, 202001, 202002)) &
  (df$activity== 0)                                  
  ),
  1,  
  0   
 )

My data has three columns: ID, Month, and X. Each id has 12-14 observations. I want to group by the id then assign a 1 to all the IDs in a group where month = 201009, 201919, 201911, 201912, 202001, 202002 and X in those rows = 0. For example, buy is the new variable I want to create:

 ID  MONTH  X Buy
 1   201901 1 1
 1   201002 0 1
 1   201903 1 1
 1   201904 0 1
 1   201905 1 1
 1   201906 1 1
 1   201907 0 1
 1   201908 0 1
 1   201909 0 1
 1   201910 0 1
 1   201911 0 1
 1   201912 0 1
 1   202001 0 1
 1   202002 0 1
 2   201901 1 0
 2   201902 1 0
 2   201903 0 0
 2   201904 0 0
 2   201905 0 0 
 2   201906 1 0 
 2   201907 0 0 
 2   201908 0 0 
 2   201909 0 0
 2   201910 1 0 
 2   201911 0 0 
 2   201912 1 0 
 2   202001 0 0 
 2   202002 0 0

I want a 1 for the entire group based on ID only if the months stated above have a zero in the X column, and a zero if it does not meet the requirements. Thank you in advance.

If Duck is not right, then it is likely because we don't understand enough about the input data and your expected output. Please make this question a little more reproducible by adding sample *unambiguous* data (e.g., `data.frame(x=...,y=...)` or the output from `dput(head(x))`) and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Dec 06 '20 at 01:38
@r2evans you are correct, I was not clear in my request. I have edited the post to include some of my data and my desired outcome. Thank you both. — Rorx, Dec 06 '20 at 04:06

score 1 · Accepted Answer · answered Dec 06 '20 at 11:34

I think this will serve your purpose

library(tidyverse)

df %>% filter(MONTH %in% c(201909, 201910,  201911, 201912, 202001, 202002)) %>%
  group_by(ID) %>% summarise(Buy = sum(X+1)) %>%
  mutate(Buy = ifelse(Buy == 6, 1, 0)) %>% right_join(df) %>%
  select(1, 3:4, 2)

   ID  MONTH X Buy
1   1 201901 1   1
2   1 201902 0   1
3   1 201903 1   1
4   1 201904 0   1
5   1 201905 1   1
6   1 201906 1   1
7   1 201907 0   1
8   1 201908 0   1
9   1 201909 0   1
10  1 201910 0   1
11  1 201911 0   1
12  1 201912 0   1
13  1 202001 0   1
14  1 202002 0   1
15  2 201901 1   0
16  2 201902 1   0
17  2 201903 0   0
18  2 201904 0   0
19  2 201905 0   0
20  2 201906 1   0
21  2 201907 0   0
22  2 201908 0   0
23  2 201909 0   0
24  2 201910 1   0
25  2 201911 0   0
26  2 201912 1   0
27  2 202001 0   0
28  2 202002 0   0

The code may have to be changed in case X has negative values.

deschen · Answer 2 · 2020-12-06T12:54:18.267

Since the TO didn't specify a proper exampel I created one of my one.

Generally, I'm wondering if this all could be done more easily, but it occured to me that this is probably one of the rare cases where it could make sense to first pivot_wider to calculate new variables. So this is what I'm doing:

Pivoting wider so that each month is its own column and each ID is a row.
Do a rowwise calculation on some months columns and check if at least one of them is 1.
Reshaping back to long format.

Note:

Since I created my own data frame I took some short cuts, so the values do not correspond to the TOs example.
I also just used some example columns for the row calculation, again not matching the columns the TO had provided.
However, this all can be adjusted easily in the code below:

Creating data

set.seed(1)
df <- data.frame(ID = c(rep(1:2, each = 12)),
                 MONTH = paste0(c(rep(2019, 12)), rep(1:12, 2)),
                 X = round(runif(24, 0,1),0))

Doing the calculations

library(tidyverse)
df %>%
  pivot_wider(id_cols      = ID,
              names_from   = MONTH,
              names_prefix = "MONTH_",
              values_from  = X) %>%
  mutate(Buy = apply(across(c(MONTH_20191, MONTH_20195, MONTH_201912)), 1, function(x) (any(x == 1))),
         Buy = as.numeric(Buy)) %>%
  pivot_longer(cols = starts_with("MONTH"),
               names_prefix = "MONTH_",
               names_to = "MONTH",
               values_to = "X") %>% print(n=Inf)

Result

# A tibble: 24 x 4
      ID   Buy MONTH      X
   <int> <dbl> <chr>  <dbl>
 1     1     0 20191      0
 2     1     0 20192      0
 3     1     0 20193      1
 4     1     0 20194      1
 5     1     0 20195      0
 6     1     0 20196      1
 7     1     0 20197      1
 8     1     0 20198      1
 9     1     0 20199      1
10     1     0 201910     0
11     1     0 201911     0
12     1     0 201912     0
13     2     1 20191      1
14     2     1 20192      0
15     2     1 20193      1
16     2     1 20194      0
17     2     1 20195      1
18     2     1 20196      1
19     2     1 20197      0
20     2     1 20198      1
21     2     1 20199      1
22     2     1 201910     0
23     2     1 201911     1
24     2     1 201912     0

R Create new variable based on group and multiple column condition

2 Answers2

Creating data

Doing the calculations

Result

Linked