1

I have 2 dummy variables

  1. physical_violence and
  2. sexual_violence.

I tried to combine them with the ifelse() function and the |-operator to create a dummy variable, which returns 1 if at least one violence has occured. The following approach outputs different results:

df <- mutate(df, physical_violence = iffelse(e03bidummy == 1 | e03cidummy == 1 |
e03didummy == 1 | e03eidummy == 1 | e03fidummy == 1 | 
e03gidummy == 1 | e03hidummy == 1 | e03iidummy == 1 | 
e03jidummy == 1, 1, 0)) 
df <- mutate(df, sexual_violence = ifelse(e04aidummy == 1 | 
e04bidummy == 1 | e04cidummy == 1 | e04didummy == 1, 1, 0))

The code for the dummy combining the two variables above:

df <- mutate(df, physical_sexual_violence = 
ifelse(physical_violence == 1 | sexual_violence == 1, 1, 0))

The results I got from the are: table(df$physical_sexual_violence): # 875 "yes", 26.614 "no"` This is contradictionary to:

  1. table(df$physical_violence): # 846 "yes" (3.07%) and 26.643 "no"
  2. table(df$sexual_violence) # 634 "yes" and 26.855 "no".

I expect 1.480 cases of violence.

Could anyone please help me identify what am I doing wrong?

user438383
  • 5,716
  • 8
  • 28
  • 43
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 13 '21 at 16:54
  • To get 1480 cases of violence you would have to `SUM` the total number of 1s from both columns, while with your code you are summing all rows in which EITHER two forms of violence is present. – GuedesBF Sep 13 '21 at 19:49
  • If one subject is positive for both violence types, are you considering this as one or two cases of violence? Your code says it is one, but this expected count of 1480 says otherwise – GuedesBF Sep 13 '21 at 19:50

2 Answers2

1

Whenever we have rowwise logical operations that can be simplified into a single TRUE/FALSE per row, we can use dplyr::if_any or dplyr::if_all.
-) First mutate(): if_any of the variables whose names matches the regex "e03[b-j]idummy", is .x==1, physical_violence will be +TRUE(this evaluates to 1).
-) The seccond mutate uses a similar logic, with the other parameters you gave.
-) The third mutate will output 1 if_any of the other two new columns is 1.

dummy data

  e03bidummy e03cidummy e04aidummy e04bidummy
1          1          0          0          0
2          0          1          0          0
3          0          0          1          1
4          0          0          0          0

solution with dplyr

library(dplyr)

df %>% mutate(physical_violence = +if_any(matches("e03[b-j]idummy"), ~.x==1),
              sexual_violence = +if_any(matches("e04[a-d]idummy"), ~.x==1),
              physical_sexual_violence= +if_any(contains('violence')))

  e03bidummy e03cidummy e04aidummy e04bidummy physical_violence sexual_violence physical_sexual_violence
1          1          0          0          0                 1               0                        1
2          0          1          0          0                 1               0                        1
3          0          0          1          1                 0               1                        1
4          0          0          0          0                 0               0                        0

if all the dummy variables are strictly 0 or 1, the code can be further simplified, ommiting the .x==1 part, as logicals are implicitly coerced to 1/0 during sum operations:

df %>% mutate(physical_violence = +if_any(matches("e03[b-j]idummy")),
              sexual_violence = +if_any(matches("e04[a-d]idummy")),
              physical_sexual_violence= +if_any(contains('violence')))
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • 1
    obrigada! That is a simpler approach to what I've done. Regarding your other comment, if one subject is posItive for the two types of violence, I'm considering it as only one case. That just answered my misunderstanding. – Andressa TB Sep 14 '21 at 08:29
0

Does this help? Of course you need to adapt for your variable names.

Sample dataframe:

# just a synthetic sample dataframe
df <- data.frame(physical_violence = c(0, 0, 1, 0, 1), # assuming no NAs
                 sexual_violence = c(0, 1, 1, 1, 0)) # assuming no NAs 

for-loop + if-else statement:

for(i in 1:nrow(df)){
  df$dummy[i] <- NA
  if(df$physical_violence[i]== 0 & df$sexual_violence[i]== 0) { 
    df$dummy[i] <- FALSE
  } else {
    df$dummy[i] <- TRUE
  }
}

Output:

df
#>   physical_violence sexual_violence dummy
#> 1                 0               0 FALSE
#> 2                 0               1  TRUE
#> 3                 1               1  TRUE
#> 4                 0               1  TRUE
#> 5                 1               0  TRUE

Created on 2021-09-13 by the reprex package (v2.0.1)

Note, this approach is neither the fastest nor the safest way, but the syntax is easy to understand for beginners. EDIT: If you need 0-1, just replace TRUE by 1 and FALSE by 0. (Do not forget to change df$dummy to a factor variable if needed.)

Pax
  • 664
  • 4
  • 23