0

I am working with geographical data, and have a string column in my data which lists a state name, followed by a comma, followed by a county name. How can I eliminate everything before the comma?

So this:

Obs    county
1      wisconsin, waukesha
2      oklahoma, tulsa
3      alabama, shelby
4      virginia, montgomery

Should become this:

Obs    county                     newcounty
1      wisconsin, waukesha         waukesha
2      oklahoma, tulsa              tulsa
3      alabama, shelby              shelby
4      virginia, montgomery        montgomery

I understand similar questions to this one have been asked in the past, but all the questions on stack have been asking to delete everything after the comma as opposed to before.

887
  • 599
  • 3
  • 15
  • https://stackoverflow.com/questions/12297859/remove-all-text-before-colon https://stackoverflow.com/questions/64341291/how-to-remove-all-text-before-a-pattern https://stackoverflow.com/questions/9704213/remove-part-of-a-string https://stackoverflow.com/questions/46274010/delete-parts-of-a-character-vector/46274024 https://stackoverflow.com/questions/7185071/remove-everything-before-period https://stackoverflow.com/questions/24400094/eliminate-characters-before-a-pattern-in-r – thelatemail Nov 22 '20 at 21:18

2 Answers2

2

We can match characters (.*) till the , followed by zero or more space (\\s*) and replace with blank ("") in sub from base R

df1$newcounty <- sub(".*,\\s*", "", df1$county)

-output

df1
#  Obs               county  newcounty
#1   1  wisconsin, waukesha   waukesha
#2   2      oklahoma, tulsa      tulsa
#3   3      alabama, shelby     shelby
#4   4 virginia, montgomery montgomery

Or another base R option is trimws

trimws(df1$county, whitespace = ".*,\\s*")
#[1] "waukesha"   "tulsa"      "shelby"     "montgomery"

data

df1 <- structure(list(Obs = 1:4, county = c("wisconsin, waukesha", 
     "oklahoma, tulsa", 
"alabama, shelby", "virginia, montgomery")), 
class = "data.frame", row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can also use strsplit() in this way:

#Code
df$newcounty <- lapply(strsplit(df$county,split = ','), function(x) trimws(x[[2]]))

Output:

df
  Obs               county  newcounty
1   1  wisconsin, waukesha   waukesha
2   2      oklahoma, tulsa      tulsa
3   3      alabama, shelby     shelby
4   4 virginia, montgomery montgomery

Some data used:

#Data
df <- structure(list(Obs = 1:4, county = c("wisconsin, waukesha", "oklahoma, tulsa", 
"alabama, shelby", "virginia, montgomery")), class = "data.frame", row.names = c(NA, 
-4L))
Duck
  • 39,058
  • 13
  • 42
  • 84