1

Using dplyr in R, I'm trying to add a new column based on the attributes of another column. For example, I have a data frame with thousands of rows of state codes (like table1). Now I want to add a new column called Region and assign the state codes to that region (like table2). How would that be done in dplyr?

table1 <- data.frame(State = c('NY','IL','CA','PA','FL','MI','AZ'))

table2 <- data.frame(State = c('NY','IL','CA','PA','FL','MI','AZ'),
                     Region = c('Northeast','Midwest','West','Northeast','Southeast','Midwest','West'))
Nate
  • 10,361
  • 3
  • 33
  • 40
brinbrin
  • 11
  • 1
  • do you have a way to lookup which region each state is in? maybe a second table or named vector... – Nate May 15 '18 at 13:39
  • I think that's what `table2` is supposed to be – divibisan May 15 '18 at 13:45
  • ahh that could be, I assumed that was the expected output – Nate May 15 '18 at 13:50
  • Yes, the end result will look like table2, but can I create a new column called "Region" (i.e. with 10 regions) and assign each of the 50 state codes to one of the ten regions. – brinbrin May 15 '18 at 13:51
  • yes if you have a place to find the value pairings. I usually use a named vector for things like this...`vec <- c("Midwest", "West") %>% set_names(c("MI","CA")); table1 %>% mutate(Region = vec[State])` – Nate May 15 '18 at 13:57

1 Answers1

1

This is a JOIN problem. Just use left_join from the dplyr package. In the example below I reordered the states in table1 to show that it can translate them regardless of the order:

library(dplyr)
table1 <- data.frame(State = c('PA','FL','MI','AZ','NY','IL','CA'))
table2 <- data.frame(State = c('NY','IL','CA','PA','FL','MI','AZ'),
                     Region = c('Northeast','Midwest','West','Northeast','Southeast','Midwest','West'))

left_join(table1, table2, by = "State")
  State    Region
1    PA Northeast
2    FL Southeast
3    MI   Midwest
4    AZ      West
5    NY Northeast
6    IL   Midwest
7    CA      West
divibisan
  • 11,659
  • 11
  • 40
  • 58