1

I have a dummy variable black where black==0 is White and black==1 is Black. I am trying to fit a linear model lm for the black==1 category only, however running the code below gives me the incorrect coefficients. Is there a way in R to run a model with the if statement, similar to Stata?

library(foreign)
df<-read.dta("hw4.dta")
attach(df)
black[black==0]<-NA
model3<-lm(rent~I(income^2)+income+black)
monarque13
  • 568
  • 3
  • 6
  • 27

3 Answers3

3

If looks like there are a few issues here. First, you've stored all your data in separate vectors rent, income and black. You should instead store it in a data frame:

data <- data.frame(rent, income, black)

To limit a data frame based on a logical expression, you can use the subset function:

data.limited <- subset(data, black == 1)

Finally, you can run your analysis on your limited data frame (presumably without the black variable):

model3 <- lm(rent~I(income^2)+income, data=data.limited)
josliber
  • 43,891
  • 12
  • 98
  • 133
  • also, subset can be used within the lm call --- lm(...,subset=black==1) – Steve Reno Feb 28 '14 at 18:49
  • I'm slightly confused. I just added some more to my above code. Does this still apply if I have my data attached? – monarque13 Feb 28 '14 at 18:52
  • 2
    I think most would agree that using attach() is generally a bad idea. better to leave your data in the data frame df and use df$variable calls for specific variables. model3<-lm(df$rent~I(df$income^2)+df$income,subset=df$black==1) should provide the results you're looking for – Steve Reno Feb 28 '14 at 18:55
  • Others who are wiser than I have suggested to avoid using subset() in code (more for on-the-fly in the console), so I've tried to get in the habit of just using '['. Thus: `lm(rent~I(income^2)+income, data=data[data[,"black"]==0,])` – rbatt Feb 28 '14 at 19:25
  • I think my coding scheme is messed up because I cleaned my categories in Stata before importing the data set into R. None of the suggestions seem to work because `levels(black)` reveals `[1] "White" "Black"`. Not sure how to remedy this. – monarque13 Feb 28 '14 at 19:35
  • @rbatt `subset` is fine for interactive use. It is better to avoid it inside functions and loops and just stick with `[` – rawr Feb 28 '14 at 19:43
  • @user3339295 Well, if you have a factor simply use `data.limited <- df[df$black=="White",]`. – Roland Feb 28 '14 at 20:59
  • I hadn't heard anything negative about using `subset` in code, so I'm interested in hearing more about this. I get that it creates a new copy of part of my data, which is some cases is inefficient. However, would there be any benefit here of using `data.limited <- df[df$black == 0,]` instead of `data.limited <- subset(data, black == 0)`? Could you clarify the cases in which it's best to avoid `subset`? – josliber Mar 01 '14 at 00:15
  • 1
    @josilber http://stackoverflow.com/q/9860090/1412059 – Roland Mar 01 '14 at 13:47
3

Why not subset the data before running the model? I personally prefer using a dataframe rather than separate vectors which will make the subsetting easier.

df <- data.frame(rent, income, black)

Then subset the dataframe, o create another one

df <- df[df$black==1,]

And run the model

model3 <- lm(rent ~ I(income^2) , data=df)
eclark
  • 819
  • 7
  • 16
2

The code written below should do it.

model3 <- lm(rent~I(income^2)+income+black, data=df, subset=df$black==1))
Nikos
  • 3,267
  • 1
  • 25
  • 32
Isidro Jr
  • 21
  • 1