0

I have a vector of addresses, like so:

address <- c("890 layton drive, wilmington de 19805", 
"227 weehawken place suite 145, comstock ny 78956", 
"13 airport highway, new castle de 19720", 
"3640 New Hampshire Avenue NW Apt 207, Washington DC 20011").  

As you can see, each of the addresseses contains words such as "drive", "place" and "suite" for example. I would like to replace those words via a dictionary vector of sorts. I was using the mapvalues function in dplyr package to create my own function like so:

sweet <- function(x) mapvalues(x, c("plaza", "street", "suite", "drive", "boulevard", "place",
                               "south", "north", "west", "east", "square", "avenue", "road",
                               "floor", "parkway", "circle", "highway"), 
                          c("plz", "st", "ste", "dr", "blvd", "pl",
                            "s", "n", "w", "e", "sq", "ave", "rd",
                            "flr", "pkwy", "cir", "hwy"))

My desired output is,

address <- c("890 layton dr, wilmington de 19805", 
"227 weehawken pl ste 145, comstock ny 78956", 
"13 airport hwy, new castle de 19720", 
"3640 New Hampshire Ave NW, Washington DC 20011").

But whenever I apply the function,

address <- sapply(address, sweet)

I get the error:

The followingfromvalues were not present inx: plaza, street, suite, drive, boulevard, place, south, north, west, east, square, avenue, road, floor, parkway, circle, highway

I figure the problem is because mapvalues is looking for exact matches, e.g. swapping "a" for "A" works but not in "a is the first letter". Is there a way around this? the solution need not be in dplyr but anything fairly efficient will work. Any advice is appreciated. Thanks.

jvalenti
  • 604
  • 1
  • 9
  • 31

1 Answers1

2

Check stringr::str_replace_all where you can pass a named vector for multiple replacement:

patterns = c("plaza", "street", "suite", "drive", "boulevard", "place", "south", "north", 
             "west", "east", "square", "avenue", "road", "floor", "parkway", "circle", 
             "highway")
replacement = c("plz", "st", "ste", "dr", "blvd", "pl", "s", "n", "w", "e", "sq", "ave", 
             "rd", "flr", "pkwy", "cir", "hwy")

stringr::str_replace_all(address, setNames(replacement, patterns))
#[1] "890 layton dr, wilmington de 19805"                    
#[2] "227 weehawken pl ste 145, comstock ny 78956"           
#[3] "13 airport hwy, new castle de 19720"                   
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"

To further ignore case and match exact word only, you can use (?i) modifier and word boundaries around each word:

stringr::str_replace_all(address, setNames(replacement, paste0('(?i)\\b', patterns, '\\b')))
#[1] "890 layton dr, wilmington de 19805"                    
#[2] "227 weehawken pl ste 145, comstock ny 78956"           
#[3] "13 airport hwy, new castle de 19720"                   
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"
Psidom
  • 209,562
  • 33
  • 339
  • 356