0

I have an existing dataset that contains zip codes for each participant. I am trying to add relevant demographic data for each zip code to the set and running into issues.

I was looking at a loop but did not fully understand it. I've started using if_else but that's also not quite working for what I'd like to do.

Essentially what I need to do is add a column for the relevant demographic, and fill the column with values corresponded to already existing values.

As an example, for the zipcode 25911, 0.857 of the population is white. In zipcode 41041, 0.952 of the population is white. And then there are a lot more to follow... I have 37 zip codes with 9 values to enter for each zip code.

Here's what I'm working with so far:

data$White <- if_else(data$Zipcode == 25911, 0.857, 0,
                  if_else(data$Zipcode ==41041, 0.952, 0,
                         if_else(data$Zipcode ==41042, 0.917, 0)))

I know that the 0 at the end tells it to fill in a 0 for every other value, but can I get it to essentially stack and build? So that first round, fill in the value for White every time the zip code is 25911, second time fill in the value for White every time the zip code is 41041, etc. Should I try a loop instead?

All help is greatly appreciated! I'm still very new to R and understanding how to best use it.

  • It is probably easier to set up a zipcode/white proportion dataframe and use `match` or `merge`. – Andrew Gustar Jan 22 '20 at 23:06
  • See this post for more details on how to do different types of `merge` https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – Ronak Shah Jan 22 '20 at 23:57

1 Answers1

0

Yes, as Andrew just said, the best way is probably to build first a look-up table (=LUT) => buid a dataframe with two columns (zip_code and value) such as like that :

# Sample data set (because we don't have yours):
data = rbind.data.frame(cbind("dummy"="AA", "Zipcode"="41042"),
                        cbind("dummy"="BB", "Zipcode"="25911"),
                        cbind("dummy"="CC", "Zipcode"="41041"),
                        cbind("dummy"="DD", "Zipcode"="41038"),
                        cbind("dummy"="EE", "Zipcode"="05077"),
                        stringsAsFactors=F)
data_copy = data

# Build the LUT dataframe (here, only 2 vars with 3 records, but you can put 9 vars with 37 records if you need):
LUT = rbind.data.frame(cbind("zip_code"="25911", "White"=0.857, "Yellow"=0.112),
                       cbind("zip_code"="41041", "White"=0.952, "Yellow"=0),
                       cbind("zip_code"="41042", "White"=0.917, "Yellow"=NA), stringsAsFactors=F)

# Then, I load the dplyr library to get available the left_join function:
library(dplyr)

# Then, to get all the columns (here, White and Yellow) in the data object:
data <- left_join(data, LUT, by=c("Zipcode"="zip_code"))

# Or, instead of the previous line code, to get for example only the White column in the data object:
data_copy <- left_join(data_copy, LUT[, c("zip_code", "White")], by=c("Zipcode"="zip_code"))

# If you want to replace all the NA by 0 in a specific column (here, White), do this :
data_copy$White <- ifelse(is.na(data_copy$White), 0, data_copy$White)

NB : you can use the merge function instead of the left_join if you prefere.

Syl33
  • 97
  • 5