Looping over patterns list to remove them for a string column in R

Question

I have a df with 2 columns where the second one represents strings that contains special characters and other characters I want to remove.

The problem

I have written a for loop that works but only after being executed Three (03) times!

Libraries & Data

library(tidyverse)
client_id <- 1:10 
client_name <- c("name5", "-name", "name--", "name-µ", "name²", "name31", "7name8", "name514", "²name8")
df <- data.frame(cbind(client_id, client_name))

Patterns to be removed

patterns <- list("-", "--", "[:digit:]", "[:cntrl:]" , "µ" , "²" , "[:punct:]")

What I have done

To remove the unwanted patterns in col 2 client_names I have written the following for loop:

for(ptrn in patterns) {
    df <- df %>% 
      mutate(client_name = str_remove(df$client_name, ptrn))

    print(ptrn) # progress
}

The above for loop removes all unwanted patterns, but only after being executed Three (03) times.

How can we fix that in order to remove all unwanted patterns since the first execution?

Should I nest the above for loop with another one in order to iterate over client_names[i]? Thanks

Does this answer your question? [remove multiple patterns from text vector r](https://stackoverflow.com/questions/29036960/remove-multiple-patterns-from-text-vector-r). You do not need a loop to do this, see this answer: https://stackoverflow.com/a/56421295/10264278 I think it is what you need. — Paul, Aug 19 '21 at 11:48
Thanks @Paul. The following works well: `df$client_name <- str_remove_all(df$client_name, paste(patterns, collapse = "|"))` — Yacine Hafiane, Aug 19 '21 at 12:00

score 3 · Answer 1 · answered Aug 19 '21 at 12:03

3

This is a more straightforward method:

Instead of making a list of all unwanted characters you can str_extract all and only the wanted ones, which, in your case, are the (Roman) alphabetic characters:

library(stringr)
df %>%
  mutate(client_name = str_extract(client_name,"[A-Za-z]+"))
   client_id client_name
1          1        name
2          2        name
3          3        name
4          4        name
5          5        name
6          6        name
7          7        name
8          8        name
9          9        name
10        10        name

answered Aug 19 '21 at 12:03

Chris Ruehlemann

20,321
4
12
34

Thanks @Chris, it works well too, but is some how less readable due to the used regular expression `"[A-Za-z]+"` – Yacine Hafiane Aug 19 '21 at 12:20
Well, if `[A-Za-z]+`is *less readable* then you have not seen many regex expressions yet ;) – Chris Ruehlemann Aug 19 '21 at 12:33
I know I know – Yacine Hafiane Aug 20 '21 at 14:36

score 1 · Accepted Answer · answered Aug 19 '21 at 11:48

You can collapse the patterns in one regex pattern and use str_remove_all to remove all the occurrences of it.

library(dplyr)
library(stringr)

ptrn <- paste0(patterns, collapse = '|')

df <- df %>% mutate(client_name = str_remove_all(client_name, ptrn))
df

#  client_id client_name
#1         1        name
#2         2        name
#3         3        name
#4         4        name
#5         5        name
#6         6        name
#7         7        name
#8         8        name
#9         9        name

data

client_id <- 1:9 
client_name <- c("name5", "-name", "name--", "name-µ", "name²", "name31", "7name8", "name514", "²name8")
df <- data.frame(client_id, client_name)

Thanks @Ronak this works also well: `ptrn <- paste0(patterns, collapse = '|') df <- df %>% mutate(client_name = str_remove_all(client_name, ptrn))` — Yacine Hafiane, Aug 19 '21 at 12:04

Looping over patterns list to remove them for a string column in R

The problem

Libraries & Data

Patterns to be removed

What I have done

2 Answers2