I have hundreds of addresses in a data frame variable and need to extract zip codes from them. Some of the addresses contain more than one city, each with a zip code. Here is a mock example of a data frame and R code that extracts the zip codes.
require(qdapRegex)
require(stringr)
df <- data.frame(address = c("Walnut; 94596, Ontario, 91761, Beach, CA 90071", "Irvine Cal 92164"), var2 = "text")
df$zip.Rinker <- sapply(df$address, FUN = rm_zip, extract=TRUE)
The rm_zip
function from Tyler Rinker's qdapRegex
package extracts all the zip codes and puts them in list if there is more than one.
> df
address var2 zip.Rinker
1 Walnut; 94596, Ontario, 91761, Beach, CA 90071 text 94596, 91761, 90071
2 Irvine Cal 92164 text 92164
How can R create a new row for each zip code that is in row 1 under zip.Rinker? Something like the following would be ideal. Note, there will be dozens of addresses that have multiple zip codes, so I am hoping for a solution that doesn't require manual steps.
address var2 zip.Rinker
1 Walnut; 94596, Ontario, 91761, Beach, CA 90071 text 94596
2 Walnut; 94596, Ontario, 91761, Beach, CA 90071 text 91761
3 Walnut; 94596, Ontario, 91761, Beach, CA 90071 text 90071
4 Irvine Cal 92164 text 92164
Thank you for time.
PS Using stringr
, this code extracts zip codes and presents the same challenge.
df$zip.stringr <- str_extract_all(string = df$address, pattern = "\\d{5}")