1

I have a dataset DF

   structure(list(Company= c("ABC", "ABC", 
"ABC", "ABC", "ABC", 
"ABC", "ABC", "XYZ", 
"XYZ", "XYZ"), year = 1951:1960, 
    dummyconflict = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = 2:11, class = "data.frame")

I want to add another column such that it increases counts upwards. That is should a Company move from level “1” to “0” over a year, the count starts with one and if it has level “1” for the year after the count continues; 2,3,4,5,6 etc. Should it however move back to “0” again, the count starts over again with zero..

Please help in adding another column based on above condition

EXPECTED RESULTS in image

enter image description here

Vaibhav Singh
  • 1,159
  • 1
  • 10
  • 25

1 Answers1

0
df = structure(list(Company= c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"), 
                    year = 1951:1960, 
                    dummyconflict = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")), 
                  row.names = 2:11, class = "data.frame")

library(dplyr)
library(data.table)

df %>%
  mutate(dummyconflict = as.numeric(as.character(dummyconflict))) %>% # update column to numeric
  group_by(Company) %>%                                               # for each company
  mutate(dummy2 = ifelse(row_number() == 1, 0, dummyconflict)) %>%    # create dummy2 variable to ignore 1s in first row
  group_by(Company, flag = rleid(dummy2)) %>%                         # create another group based on 1s and 0s positions and group by that and company
  mutate(NewVar = cumsum(dummy2)) %>%                                 # get cumulative sum of dummy2 column
  ungroup() %>%                                                       # forget the grouping
  select(Company, year, dummyconflict, NewVar)                        # keep relevant columns

# # A tibble: 10 x 4
#   Company  year dummyconflict NewVar
#   <chr>   <int>         <dbl>  <dbl>
# 1 ABC      1951             0      0
# 2 ABC      1952             0      0
# 3 ABC      1953             1      1
# 4 ABC      1954             1      2
# 5 ABC      1955             1      3
# 6 ABC      1956             0      0
# 7 ABC      1957             1      1
# 8 XYZ      1958             1      0
# 9 XYZ      1959             1      1
#10 XYZ      1960             1      2

It would be good to run this process step by step to make sure you get how it works so you can easily spot any bugs when you apply it to your big dataset.

AntoniosK
  • 15,991
  • 2
  • 19
  • 32