If the variables contain only 0 or 1, then the following could be used:
with(data_tot,
rowSums(cbind(X5_A_01_d_Heart.Disease,
X5_A_01_e_Stroke,
X5_A_01_f_Chronic.Kidney.Disease))
)
Edit:
And if they are coded as 1 (yes) and 2 (no), plus if other risk factors such as blood pressure and cholesterol level are to be included, AND there are no missing values in these risk factor variables, then you'll can use something similar to the following:
data_tot %>%
mutate(CVD_Risk.Factors=
(Heart == 1) +
(Stroke == 1) +
(CKD == 1) +
(Systolic_BP >= 130) + (Diastolic_BP >= 80) +
(Cholesterol > 150))
Heart Stroke CKD Systolic_BP Diastolic_BP Cholesterol CVD_Risk.Factors
1 1 1 2 118 90 200 4
2 2 1 2 125 65 150 1
3 2 1 1 133 95 190 5
4 1 1 2 120 87 250 4
5 2 2 2 155 110 NA NA
6 2 2 2 130 105 140 2
You can see that if there are any missing values, then this would not work. One solution is to use rowwise
and then sum
.
data_tot %>%
rowwise() %>% # This tells R to apply a function by the rows of the selected inputs
mutate(CVD_Risk.Factors=sum( # This function has an "na.rm" argument
(Heart == 1),
(Stroke == 1),
(CKD == 1),
(Systolic_BP >= 130), (Diastolic_BP >= 80),
(Cholesterol > 150), na.rm=TRUE)) # Omit NA in the summations
# A tibble: 6 x 7
Heart Stroke CKD Systolic_BP Diastolic_BP Cholesterol CVD_Risk.Factors
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 1 1 2 118 90 200 4
2 2 1 2 125 65 150 1
3 2 1 1 133 95 190 5
4 1 1 2 120 87 250 4
5 2 2 2 155 110 NA 2 # not NA
6 2 2 2 130 105 140 2
Data:
data_tot <- data.frame(Heart=c(1,2,2,1,2,2),
Stroke=c(1,1,1,1,2,2),
CKD=c(2,2,1,2,2,2),
Systolic_BP=c(118,125,133,120,155,130),
Diastolic_BP=c(90,65,95,87,110,105),
Cholesterol=c(200,150,190,250,NA,140))