1

I have a df with 2 columns where the second one represents strings that contains special characters and other characters I want to remove.

The problem

I have written a for loop that works but only after being executed Three (03) times!

Libraries & Data

library(tidyverse)
client_id <- 1:10 
client_name <- c("name5", "-name", "name--", "name-µ", "name²", "name31", "7name8", "name514", "²name8")
df <- data.frame(cbind(client_id, client_name))

Patterns to be removed

patterns <- list("-", "--", "[:digit:]", "[:cntrl:]" , "µ" , "²" , "[:punct:]")

What I have done

To remove the unwanted patterns in col 2 client_names I have written the following for loop:

for(ptrn in patterns) {
    df <- df %>% 
      mutate(client_name = str_remove(df$client_name, ptrn))

    print(ptrn) # progress
}

The above for loop removes all unwanted patterns, but only after being executed Three (03) times.

How can we fix that in order to remove all unwanted patterns since the first execution?

Should I nest the above for loop with another one in order to iterate over client_names[i]? Thanks

  • 1
    Does this answer your question? [remove multiple patterns from text vector r](https://stackoverflow.com/questions/29036960/remove-multiple-patterns-from-text-vector-r). You do not need a loop to do this, see this answer: https://stackoverflow.com/a/56421295/10264278 I think it is what you need. – Paul Aug 19 '21 at 11:48
  • Thanks @Paul. The following works well: `df$client_name <- str_remove_all(df$client_name, paste(patterns, collapse = "|"))` – Yacine Hafiane Aug 19 '21 at 12:00

2 Answers2

3

This is a more straightforward method:

Instead of making a list of all unwanted characters you can str_extract all and only the wanted ones, which, in your case, are the (Roman) alphabetic characters:

library(stringr)
df %>%
  mutate(client_name = str_extract(client_name,"[A-Za-z]+"))
   client_id client_name
1          1        name
2          2        name
3          3        name
4          4        name
5          5        name
6          6        name
7          7        name
8          8        name
9          9        name
10        10        name
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

You can collapse the patterns in one regex pattern and use str_remove_all to remove all the occurrences of it.

library(dplyr)
library(stringr)

ptrn <- paste0(patterns, collapse = '|')

df <- df %>% mutate(client_name = str_remove_all(client_name, ptrn))
df

#  client_id client_name
#1         1        name
#2         2        name
#3         3        name
#4         4        name
#5         5        name
#6         6        name
#7         7        name
#8         8        name
#9         9        name

data

client_id <- 1:9 
client_name <- c("name5", "-name", "name--", "name-µ", "name²", "name31", "7name8", "name514", "²name8")
df <- data.frame(client_id, client_name)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks @Ronak this works also well: `ptrn <- paste0(patterns, collapse = '|') df <- df %>% mutate(client_name = str_remove_all(client_name, ptrn))` – Yacine Hafiane Aug 19 '21 at 12:04