0

I want to create a dummy variable, and I need to combine several conditions of other variables to assign value 0 to it. One of the variables used is newly created and I get the problem here.

attach(originaldata)
originaldata$hk_nonagr[hktype == 2 |hktype ==4 |hktype == 5] <- 1
originaldata$hk_nonagr[hktype == 1 |hktype == 3 |hktype == 6 |hktype == 7] <- 0
originaldata$hk_nonagr <- factor(originaldata$hk_nonagr,
                             levels = c(1, 0),
                             labels = c("yes", "no"))

This is my newly created variable. Then I want to use it to create another variable.

originaldata$hk_effort[urbanhk == 1|urbanhk == 2|urbanhk == 3|urbanhk == 4|urbanhk == 7] <- 1
originaldata$hk_effort[originaldata$hk_nonagr == 0 |yr_urbanhk == 9997|r_urbanhk == 5|r_urbanhk == 6|r_urbanhk ==8|r_urbanhk ==9] <- 0

Here I get the problem. Value 0 cannot be assigned. I tried

 originaldata$hk_effort[originaldata$hk_nonagr == 0] <- 0

It doesn't work, that's why I think the problem is about the newly created variable. I get the same problem whenever I use a newly created variable in the condition.

I am a beginner in R, so please tell me whether this way to code is bad. In Stata, I am so used to write something like

replace x = 4 if (a == 1 | b ==3 ) & c != 8

But I now feel R users don't code in this way. Thank you in advance for any advice.

DXC
  • 75
  • 1
  • 7
  • Your problem is almost certainly that you are using `attach`. Don't do it. It is associated with many problems. Instead use `with` or simply type out the full names. You will avoid such headaches. – lmo Aug 04 '17 at 14:48
  • It would be easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output data. Also it's generally not a good idea to use `attach()` in your R code. For mutations like this, you might also consider the `dplyr` library to make things cleaner. – MrFlick Aug 04 '17 at 14:49
  • @Imo @ MrFlick Thanks. I will avoid `attch()` and try `with` and `dplyr` . – DXC Aug 04 '17 at 14:56
  • Have you checked that the variable is actually being created the way you expect (e.g. do a table and then a table with hktype)? – Elin Aug 04 '17 at 15:50
  • @Elin Yes, I am pretty sure `hk_nonagr` is created correctly. I compared it with what I got in Stata. – DXC Aug 04 '17 at 18:12
  • So if you type `sum(originaldata$hk_nonagr == 0)` you get the correct number? There is no reason that using the new variable wouldn't work. Try making a minimal reproducable sample using the mpg data. – Elin Aug 04 '17 at 18:16
  • @ Elin I found out the problem. If I write `originaldata$hk_nonagr == "no"`, it'll work. I guess because I have factorized `hk_nonagr`, I need to use the labels instead of numeric values? This is very strange. In Stata, adding labels doesn't change the type of the data. It seems R works differently. I didn't realize this. – DXC Aug 05 '17 at 08:08
  • Thinking about "labels" differently is one of the key differences between other statistical applications and R. If a variable is a factor, it is a factor, the fact that it may be stored as a number for efficiency does not change this. Also if you really want a dummy variable don't use 0 and 1 use appropriate text, R handles managing dummy variables in models for you if you have defined variables as factors and the labels will display the effects (relatively) nicely. – Elin Aug 06 '17 at 03:52
  • @Elin Thanks a lot. This is very informative. – DXC Aug 07 '17 at 09:05

1 Answers1

0

With dplyr you could combine mutate with ifelseto achieve what you are trying.

Here you can check how to use dplyfor conditional mutating.

The code:

library(dplyr)
originaldata <- originaldata %>% mutate(hk_nonagr = ifelse(hktype == 2 | 
                hktype ==4 | hktype == 5, 1,
                ifelse(hktype == 1 |hktype == 3 |hktype == 6 |hktype == 7, 
                0))) 

originaldata <- originaldata %>% mutate(hk_effort = ifelse(urbanhk == 
                1 | urbanhk == 2 | urbanhk == 3 | urbanhk == 4 | urbanhk == 
                7, 1,
                ifelse(hk_nonagr == 0 | yr_urbanhk == 9997| r_urbanhk == 
                5 | r_urbanhk == 6 | r_urbanhk ==8 | r_urbanhk ==9, 0))) 

This is one way of doing, you can find other options.

csmontt
  • 614
  • 8
  • 15