-4

I have 2 data frames, one contains data with postal code, the other contains regions with a set of postal codesenter image description here

I want to add a "Regions" column in Dataframe 1, based on the postal codes, how can I do that? (note: regions in dataframe 2 can contains several postal codes.

Thanks for help.

1 Answers1

1

This can be solved with dplyr and tidyr. I'm sure there are other solutions too.

# create the data
df1 <- data.frame(pcodes = c(1001, 1002, 1003))
df2 <- data.frame(regions = c(1, 2), 
                  pcodes = c("1001, 1002, 1003", "1004, 1005"),
                  stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)

# separate postcodes column and reshape long
# (from https://stackoverflow.com/a/33288868/2633645)
df2 <- df2 %>% 
  mutate(to = strsplit(pcodes, split = ",")) %>% 
  unnest(to) %>% 
  mutate(to = as.numeric(to)) %>% 
  select(-pcodes) %>% 
  rename(pcodes = to) # rename `to` to `pcodes` for join purpose

# join the data sets by the common variable pcodes
df_both <- left_join(df1, df2)
df_both

  pcodes regions
1   1001       1
2   1002       1
3   1003       1
r.bot
  • 5,309
  • 1
  • 34
  • 45