Remove last word from string

Question

I'm trying to do something but can't remember/find the answer. I have a list of city names from the Census Bureau and they put the city's type on the end which is messing up my match().

I'd like to make this:

Middletown Township
Sunny Valley Borough
Hillside Village

into this:

Middletown
Sunny Valley
Hillside

Any suggestions? Ideally I'd also like to know if there's a lastIndexOf() function in R.

Here's the data:

df1 <- data.frame(
  id = c(1, 2, 3),
  city = factor(c("Middletown Township", "Sunny Valley Borough", "Hillside Village"))
)

Josh O'Brien · Accepted Answer · 2012-10-26T20:33:34.403

22

This will work:

gsub("\\s*\\w*$", "", df1$city)
[1] "Middletown"   "Sunny Valley" "Hillside"

It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.

edited Oct 26 '12 at 20:33

answered Oct 26 '12 at 20:28

Josh O'Brien

159,210
26
366
455

1

What if I want to get "Township, Borough, Village" i.e. the last word. And save it as a new variable? – jacob Aug 10 '15 at 12:49
1

You can use stringi package. Function name is stri_extract_last_words. You can provide this function a list of sentences and it will return list of last words in those sentences. However, it will not remove the last word from sentence, for removing purpose, we need the gsub command provided by Josh – rkmalaiya May 06 '16 at 14:03

score 18 · Answer 2 · answered Oct 26 '12 at 20:29

18

Here's a regexp that does what you need:

sub(df1$city, pattern = " [[:alpha:]]*$", replacement = "")

[1] "Middletown" "Sunny Valley" "Hillside"

That's replacing a substring that starts with a space, then contains only letters until the end of the string, with an empty string.

answered Oct 26 '12 at 20:29

Tyler

9,872
2
33
57

+1 this regex is better answer as it leaves single word intact. – topchef Jul 25 '13 at 16:56

score 6 · Answer 3 · answered Apr 20 '21 at 08:43

I would use word() in the stringr package like so:

df1 %>% mutate(city = word(city , 1  , -2))

The first argument (1) indicates that you're starting from the first word, and the second (-2) indicates that you're keeping everything up to the second last word.

Remove last word from string

3 Answers3

Linked

Related