0

I am trying to figure out a way to assign a column that would list out the number of criteria that is met by a certain row. For example, I am looking at how many risk factors for heart disease someone has met and trying to run an ordinal regression on those values. I have tried

cvd_status <- ifelse( data_tot$X5_A_01_d_Heart.Disease=="1"|data_tot$X5_A_01_e_Stroke=="1"|data_tot$X5_A_01_f_Chronic.Kidney.Disease==1, 1,0) 

but that only gives me whether people have any risk factors, not how many risk factors they have. Is there any way to figure out how many risk factors someone would have?

Edit: The variables are not simply binary, but are either 1s or 2s or ranges of numbers.

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 12 '20 at 02:44

1 Answers1

0

If the variables contain only 0 or 1, then the following could be used:

with(data_tot,
     rowSums(cbind(X5_A_01_d_Heart.Disease, 
                   X5_A_01_e_Stroke,
                   X5_A_01_f_Chronic.Kidney.Disease))
)

Edit:

And if they are coded as 1 (yes) and 2 (no), plus if other risk factors such as blood pressure and cholesterol level are to be included, AND there are no missing values in these risk factor variables, then you'll can use something similar to the following:

data_tot %>%
  mutate(CVD_Risk.Factors=
           (Heart == 1) + 
           (Stroke == 1) + 
           (CKD == 1) +
           (Systolic_BP  >= 130) + (Diastolic_BP >= 80) +
           (Cholesterol > 150))

  Heart Stroke CKD Systolic_BP Diastolic_BP Cholesterol CVD_Risk.Factors
1     1      1   2         118           90         200                4
2     2      1   2         125           65         150                1
3     2      1   1         133           95         190                5
4     1      1   2         120           87         250                4
5     2      2   2         155          110          NA               NA
6     2      2   2         130          105         140                2

You can see that if there are any missing values, then this would not work. One solution is to use rowwise and then sum.

data_tot %>%
  rowwise() %>%  # This tells R to apply a function by the rows of the selected inputs
  mutate(CVD_Risk.Factors=sum(  # This function has an "na.rm" argument
           (Heart == 1), 
           (Stroke == 1), 
           (CKD == 1),
           (Systolic_BP  >= 130), (Diastolic_BP >= 80),
           (Cholesterol > 150), na.rm=TRUE))  # Omit NA in the summations

# A tibble: 6 x 7
  Heart Stroke   CKD Systolic_BP Diastolic_BP Cholesterol CVD_Risk.Factors
  <dbl>  <dbl> <dbl>       <dbl>        <dbl>       <dbl>            <int>
1     1      1     2         118           90         200                4
2     2      1     2         125           65         150                1
3     2      1     1         133           95         190                5
4     1      1     2         120           87         250                4
5     2      2     2         155          110          NA                2 # not NA
6     2      2     2         130          105         140                2

Data:

data_tot <- data.frame(Heart=c(1,2,2,1,2,2),
                       Stroke=c(1,1,1,1,2,2),
                       CKD=c(2,2,1,2,2,2),
                       Systolic_BP=c(118,125,133,120,155,130),
                       Diastolic_BP=c(90,65,95,87,110,105),
                       Cholesterol=c(200,150,190,250,NA,140))
Edward
  • 10,360
  • 2
  • 11
  • 26
  • Is there a way to do it if the variables are not simply binary? – Akshay Nathan Mar 12 '20 at 02:56
  • What are they ? – Edward Mar 12 '20 at 04:02
  • They are either 1 for yes or 2 for no with the above variables, or are on a linear scale like blood pressure or cholesterol. For the latter, I want to know whether the row has a value above or below a set value. – Akshay Nathan Mar 12 '20 at 17:25
  • Ahhh, that makes quite a big difference. Please see my edited code which provides two options if there are NA in any variable. Let me know if it works or not. :) – Edward Mar 13 '20 at 02:10