1

Allow me to preface this by saying that I am new to R. I cleaned some income and rent variables and now I am trying to recode my race variable from 9 categories to 2. The original variable is coded as follows:

1=White 2=Black 3=Native 4=Asian 5=A 6=B 7=C 8=D 9=E. I'm basically trying to eliminate all other races and only keep White and Black as a dummy variable, where White=0 and Black=1. Here's the code:

library(foreign)
library(ggplot2)
df<-read.dta("acs2010.dta")
View(df)
attach(df)
summary(df)

inctot[inctot==9999999]<-NA
inctot[inctot<=0]<-NA
summary(inctot)
incomesq<-(inctot)^2

rent[rent==0]<-NA
summary(rent)

levels(race)[1]<-"White"
levels(race)[2]<-"Black"
levels(race)[3:9]<-NA
levels(race)

ggplot(data=df,aes(x=race))+geom_bar()
view(df)

Manipulating the levels leaves me with "White" and "Black" but when I plot it, it shows the NA's as well. I'm not sure how to get rid of NA's in factor variables. Any ideas would be appreciated.

monarque13
  • 568
  • 3
  • 6
  • 27
  • 1
    I would suggest changing the title of this question: it seems to really be about omitting missing data from the plot, rather than about recoding a factor as a dummy variable. Better title will be more helpful to future readers. – anandthakker Feb 21 '14 at 23:46
  • Sorry about making it sound like i was a plotting issue. I'm actually struggling with creating a dummy variable from a categorical variable and the plot made me aware of the problem. – monarque13 Feb 22 '14 at 00:18

1 Answers1

0

The approach in the question to recoding the race factor looks fine.

It seems that the real problem here was omitting the NAs from the plot. Just subset the data frame:

ggplot(data =df[!is.na(df$race),], aes(x=race)) + geom_bar()

Further reading:

anandthakker
  • 628
  • 6
  • 15
  • Thanks for your suggestion, the plot worked just fine! I am familiar with some other statistical packages that have easy recoding procedures such as Stata. Is this the easiest way to create dummy variables in R? I know that having a factor variable is advantageous and requires no extra work when running a regression models, but for some reason creating dummies seems like lots of work in R. – monarque13 Feb 22 '14 at 00:21
  • I don't have tons of experience with it, but from the Cookbook-R page on [recoding data](http://www.cookbook-r.com/Manipulating_data/Recoding_data/), it looks fairly reasonable... – anandthakker Feb 22 '14 at 01:51