1

I have a data frame which is in long format. I have multiple cities. Each of the cities has information for each month and information also for each code (this code goes from 100 up to 1,000). My dataframe looks like this:

Code City month Data
100 A 10 0
100 B 12 1
100 A 10 2
100 B 12 3
100 A 10 4
100 B 12 5
200 A 10 10
200 B 12 11
200 A 10 12
200 B 12 13
200 A 10 14
200 B 12 15

I´m trying to create a new var that adds up the information in the Data variable foreach month when the variable Code is equal to 100. So for the 10th month I would have a result of 6, and for the 12th month I would have a result of 9:

Code
6
9
6
9
6
9
6
9
6
9
6
9

For this I´m using dplyr:

df <- df %>%
group_by(month) %>% 
mutate(newvar =case_when(Code==100 ~ as.integer(rowSums(select_(., "Data"), na.rm = TRUE))))       

However, I´m getting an error and I haven´t been able to create this new variable correctly. I know that an easier way would be using base R. But I want to use dplyr.

Any help is really appreciate it!

2 Answers2

1

You can sum the Data value only where Code = 100 for each month.

library(dplyr)

df %>%
  group_by(month) %>%
  mutate(newvar = sum(Data[Code == 100], na.rm = TRUE)) %>%
  ungroup
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can also do

library(dplyr)
df %>%
   group_by(month) %>%
   mutate(newvar = sum(case_when(Code == 100 ~ Data), na.rm = TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662