0

I have data on a metropolitan area and want to extract out the city info.

An example is

test <- c("Akron, OH METRO AREA","Auburn, NY Micro Area","Boston-Cambridge, MA-NH")

And I want it to look like

"Akron, OH", "Auburn, NY", "Boston-Cambridge, MA"

So just the City, State

MrFlick
  • 195,160
  • 17
  • 277
  • 295
user3304359
  • 335
  • 1
  • 9

2 Answers2

4

An option is sub from base R by matching one ore more space (\\s+) followed by the , followed dby the upper case letters ([A-Z]+), capture as a group ((...)), in the replacement, specify the backreference (\\1) of the captured group

sub("(,\\s+[A-Z]+).*", "\\1", test)
#[1] "Akron, OH"            "Auburn, NY"           "Boston-Cambridge, MA"
akrun
  • 874,273
  • 37
  • 540
  • 662
2

An easy option is a stringr::str_extract

test <- c("Akron, OH METRO AREA","Auburn, NY Micro Area","Boston-Cambridge, MA-NH")
stringr::str_extract(test, "[^,]+, .{0,2}")
# [1] "Akron, OH"            "Auburn, NY"           "Boston-Cambridge, MA"

We match anything that's not a comma, then a comma-space-then up to two more character.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks! I always forget stringr cause I don't have much experience with regex. Makes sense! – user3304359 Aug 27 '19 at 20:09
  • Another one for you? If I have "Virginia Beach-Norfolk-Newport News, VA" How can I make it into 3 rows Virginia Beach, VA Norfolk, VA Newport News, VA – user3304359 Aug 27 '19 at 20:11
  • @user3304359 That's a different issue than what you've described here. Maybe open up a different question. – MrFlick Aug 27 '19 at 20:14