I have a dataset that contains the ZIP code, as well as 2 indicator variables denoting the race and ethnicity of the individual, as well as the count for each combination. The data can be accessed as follows:
Data1990 <- read.table("https://www2.census.gov/programs-surveys/popest/tables/1990-2000/intercensal/st-co/stch-icen1990.txt")
names(Data1990)<-c("Year","ZIP","Age","Race","Ethnic","Count")
Data1990<-Data1990[,c(2,4:6)]
The data looks like this:
ZIP Race Ethnic Count
1 1001 1 1 239
2 1001 2 1 203
3 1001 1 1 821
4 1001 2 1 769
5 1001 1 1 1089
6 1001 2 1 961
I would like to create a new dataset which contains the proportion of white non-hispanics (Race = 1 or 2 and Ethnicity=1) in that particular ZIP code. I tried getting the sum of the white non-hispanics for each zip code by using a forloop. However, this code just gives the Variable "White" a 0 for each observation.
zip<-unique(Data1990$ZIP)
Data1990New<-data.frame(zip,White=NA)
for(i in zip){
Data1990New[which(zip==i),]$White <- sum(Data1990[(Data1990$Race==1 |Data1990$Race==2)&Data1990$Ethnic==1&
Data1990$ZIP==Data1990New$i,][,4])
}
head(Data1990New)
I would like to do this in a more efficient way since forloops are slow. This is relevant but I think that my question is slightly different since I have an ID variable.
Thanks in advance!