4

I have a Column consisting of several Country Offices associated a with a company, where I would like to shorten fx: China Country Office and Bangladesh Country Office, to just China or Bangladesh- In other words removing the words "Office" and "Country" from the column called Imp_Office.

I tried using the tm-package, with reference to an earlier post, but nothing happened.

what I wrote:

library(tm)
stopwords = c("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","", 
MY_df$Imp_Office))

Where I got the following error message:

  Error in gsub(paste0(stopwords, collapse = "|", "", MY_df$Imp_Office)) 
    : 
      argument "x" is missing, with no default

I also tried using the function readLines:

stopwords = readLines("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","", 
MY_df$Imp_Office))

But this didn't help either

I have considered the possibility of using some other string manipulation method, but I don't need to detect, replace or remove whitespace - so I am kind of lost here.

Thank you.

BloopFloopy
  • 139
  • 1
  • 2
  • 12
  • 1
    You've got a misplaced closing bracket. `Complete_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|"), "", Complete_df$Imp_Office)` – Jake Kaupp Apr 23 '18 at 16:23
  • gsub(paste0(stopwords, collapse = "|"),"",Complete_df$Imp_Office)..you need a paranthesis to close the paste function – Onyambu Apr 23 '18 at 16:25
  • Solution for previously asked [Remove certain words in string from column in dataframe...](https://stackoverflow.com/questions/40901100/remove-certain-words-in-string-from-column-in-dataframe-in-r) may help. – Anthony Simon Mielniczuk Apr 24 '18 at 17:27

1 Answers1

12

First, let's set up a dataframe with a column like what you describe:

library(tidyverse)


df <- data_frame(Imp_Office = c("China Country Office",
                                "Bangladesh Country Office",
                                "China",
                                "Bangladesh"))
df
#> # A tibble: 4 x 1
#>   Imp_Office               
#>   <chr>                    
#> 1 China Country Office     
#> 2 Bangladesh Country Office
#> 3 China                    
#> 4 Bangladesh

Then we can use str_remove_all() from the stringr package to remove any bits of text that you don't want from them.

df %>%
    mutate(Imp_Office = str_remove_all(Imp_Office, " Country| Office"))
#> # A tibble: 4 x 1
#>   Imp_Office
#>   <chr>     
#> 1 China     
#> 2 Bangladesh
#> 3 China     
#> 4 Bangladesh

Created on 2018-04-24 by the reprex package (v0.2.0).

Julia Silge
  • 10,848
  • 2
  • 40
  • 48
  • One thing to consider could be to use "[ ]" as it was suggested: https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_remove . So df %>% mutate(Imp_Office = str_remove_all(Imp_Office, " [Country| Office]")) – Cenk Jan 10 '21 at 07:50