Making new variable conditional on responses for a set of other ones

Question

I'm new to programming here, so I apologize that this is a rudimentary question. I'm working in R.

My data set: in brief, I have a list of about 30 diseases (some include asthma, back pain, etc. which would not necessarily indicate disease but, rather, some sort of chronic condition). For each question, respondents answered "1" if they have the disease, "2" if not, and "88" if they didn't know their disease status.

I want to create a new variable, say, called "chronic", which captures all individuals that have either 1, 2, or 3+ chronic conditions. Of course, I could sit here and go through all the conditionals, but is there an efficient way to scan across each specific disease column, and if participants answered "1", create some sort of running sum such that the total in this new "chronic" variable indicates the number of chronic conditions they have?

Thanks in advance

Usually _no_ is represented as 0, and _yes_ as 1. I'd recommend replacing your values with that, as well as 88 for `NA`. With that you can, for example, use sum() to count the number of positive answers for a given desease or respondent. This can be easily done with `df[df==88]=NA`, `df[df==2]=0`. — Molx, Mar 23 '15 at 01:19
@Molx Better idea.. I'll try this method. My df doesn't just contain the disease variables, there are other demographic vars such as age, etc. that are coded as 2. Is there a way to select a subset of columns (the disease ones) and set the "2" values to "0" like you suggested? I'm coming up with something like this.. but when ran, nothing changes (8, 24 represent column numbers) : nhss[nhss[,c(8:24)==2]]=0 — Jason, Mar 23 '15 at 02:30
If you want to restrain your changes to specific columns, you subset these columns before subsetting the rows which you want to change: `nhss[,8:24][nhss[,8:24]==2]=0`. — Molx, Mar 23 '15 at 02:46

score 1 · Answer 1 · edited May 23 '17 at 11:50

1

You'll want to use one of the apply functions. In this case, if your data.frame is called df then you can sum up the number of responses equal to 1 using apply() as in:

apply(df,
      1, # apply for each row (i.e. the first margin)
      function(x)# a function which takes a row's worth of data as it's argument
           sum(x ==1))# and returns the number of responses equal to 1

edited May 23 '17 at 11:50

Community

1
1

answered Mar 23 '15 at 01:12

Jthorpe

9,756
2
49
64

This would be a lot easier: `rowSums( df == "1")` – IRTFM Mar 23 '15 at 01:24
Right as always @BondedDust :) – Jthorpe Mar 23 '15 at 01:33

Making new variable conditional on responses for a set of other ones

1 Answers1