0

I have a dataset from a survey that has several similar variables due to the way the survey had to be set up. For instance, I have 20 different variables for the price of medium soda in 2016. A facility only has a response on one medium soda question (it depended on the type of facility they were). I would like to add these together in R to get one medium soda variable for all facilities. An example of what the data looks like is below.

Q5a_MediumSoda_Coffee: 2.25, 3.35, NA, NA, NA, NA, NA...
Q6a_Mediumsoda_Burgers:NA,NA, 2.50, NA, NA, NA, NA...
Q7a_MediumSoda_Thai:NA,NA,NA,NA,2.30, 1.50, 2.75..

I attempted to combine all these variables into one by adding them together:

MediumSoda2016<-sum(Q5a_16_MedS_FSCoff+Q7a_16_MedS_FSAsian+Q9a_16_MedS_FSAmer+Q11a_16_MedS_FSDeli+Q13a_16_MedS_FSMex+Q15a_16_MedS_FSPizza+Q17a_16_MedS_FSPub+Q19a_16_MedS_FSBurgers+Q21a_16_MedS_FSItalian+Q23a_16_MedS_FSBBQRibs+Q25a_16_MedS_FSSeafood+Q27a_16_MedS_FSMed_Greek+Q29a_16_MedS_FSIndian+Q31a_16_MedS_FSOther, na.rm=TRUE)*

However, I get the following error:

Error in Q5a_16_MedS_FSCoff + Q7a_16_MedS_FSAsian + Q9a_16_MedS_FSAmer +  : 
  non-numeric argument to binary operator

I checked and all variables are numeric so I assume it is an issue with the sum function (and I am using the wrong function), but cannot seem to figure out what code to use. My hope is to combine all of these so I have one column of medium soda data with prices for each facility in this column. Any help would be greatly appreciated.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 2
    Seems like one of your variables is not numeric or you typed it wrong. Not a lot to go on here without a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) . Combining both `sum()` and `+` seems odd. It's unclear to me whether you want a single value, or a single value per row. Have you considered putting all these variables into a proper data frame rather than working with a bunch of different vectors? – MrFlick Aug 23 '17 at 20:09
  • There's an asterisk at the end of your call: `..., na.rm=TRUE)*`. Whatever starts the next line probably isn't numeric. – Nathan Werth Aug 23 '17 at 20:20

1 Answers1

0

take MrFlick's advice and combine your data into a dataframe first. then, use the ifelse function to create a new variable depending on whether any of the other variables are NA.

# assuming the three "variables" are vectors
df <- data.frame(Q5a_MediumSoda_Coffee, Q6a_Mediumsoda_Burgers, Q7a_MediumSoda_Thai)

# use vectorized operation as well as ifelse to create new variable
df$median <- ifelse(is.na(df$Q5a_MediumSoda_Coffee), df$Q6a_Mediumsoda_Burgers, df$Q5a_MediumSoda_Coffee)
df$median <- ifelse(is.na(df$median), df$Q7a_MediumSoda_Thai, df$median)
sweetmusicality
  • 937
  • 1
  • 10
  • 27