0

I have a column of company names.
db_name$name

I've found the 100 most common endings (Inc, Ltd, GmbH, Co, etc.), and concatenated them to make them easier to use with Regular Expressions.

`db_name$ending <- word(db_name$name,-1)
 db_end_count <- data.frame(table(db_name$ending)) %>%
  arrange(desc(Freq)) %>% 
  filter(row_number()<=100)
 db_end <- str_c(db_end_count$Var1,"", collapse = "|")`

I'd like to remove these common endings from the end of each of the strings, while not removing them from the interior words ('Communications Co' not becoming 'mmunications '), and also keeping the company names that only consist of one word.

The solution I've been experimenting with I derived from here: R remove last word from string, which basically says, gsub("\\s*\\w*$", "", db_name$name), except I've been replacing \\w with my vector of 100 most common endings, using the rebus package. However, every different form I try (with or without the * or the \\s) results in one of the issues I described above (truncated words, omission of whole words).

Could someone suggest a way I could remove the most common company endings from the end of the company anem strings, either in the way I've done it so far, or something even more clever? Thanks!

  • look at my answer in this question from today: [HERE](https://stackoverflow.com/questions/49730743/clear-title-from-name-using-r) – Andre Elrico Apr 09 '18 at 13:54
  • @AndreElrico Your answer has some problems, and besides that in general you should not be promoting your stuff like this. – Tim Biegeleisen Apr 09 '18 at 13:56
  • @TimBiegeleisen What are the problems? I'm not promoting but helping. – Andre Elrico Apr 09 '18 at 13:57
  • Nicholas, it would be helpful if you included a subset of the data you are working with. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – C. Braun Apr 09 '18 at 14:00

1 Answers1

0

Thanks for your help guys, but with some assistance from a colleague I was able to include the vector I wanted in the solution I mentioned above with paste as so:
db_name$names <- gsub(paste0("\\s(",db_end,")+$"), "", db_name$name)