Mutate to remove all parenthesis (and contents) from string in R

Question

I'm trying to use mutate/str_replace to generate "Phenotype' from "Class" by removing parenthesis (including contents) but need some help with the Regex? I would also like to then reorder the text within "Phenotype" strings such that text is shown in order PanCK>PD-L1>CD8>FoxP3>PD-1>CD68. Apologies for the non-standard dataset! Many thanks!

test<- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
, "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", "PanCK (Opal 690): CD68 (Opal 780)", 
"FoxP3 (Opal 570): PanCK (Opal 690)"))

The bit I'm having trouble with

test.output<- test %>% mutate(Phenotype = str_replace(Class, "\\([^()]{0,}\\)", ""))

desired output:

test.output <- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
                                    , "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", 
                                    "PanCK (Opal 690): CD68 (Opal 780)", "FoxP3 (Opal 570): PanCK (Opal 690)"), 
                          Phenotype = c("FoxP3:PanCK:PD-1:CD68", "CD8:PanCK:CD68:PD-L1", 
                                        "PanCK:CD68", "CD8:PanCK:CD68:PD-L1"))

to then be reordered such that PanCK>PD-L1>CD8>FoxP3>PD-1>CD68

ordered.output<- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
                                            , "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", 
                                            "PanCK (Opal 690): CD68 (Opal 780)", "FoxP3 (Opal 570): PanCK (Opal 690)"), 
                                  Phenotype = c("FoxP3:PanCK:PD-1:CD68", "CD8:PanCK:CD68:PD-L1", 
                                                "PanCK:CD68", "CD8:PanCK:CD68:PD-L1"),
                                  Phenotype_Ordered = c("PanCK:FoxP3:PD-1:CD68", "PanCK:PD-L1:CD8:CD68",
                                                        "PanCK:CD68","PanCk:PD-L1:CD8:CD68"))

Removing parens and their contents is a duplicate [of this question](https://stackoverflow.com/a/24173271/903061) - maybe you can apply the answer there and edit this question to focus on the reordering? — Gregor Thomas, Nov 12 '20 at 04:46
You've got the right idea with your regex, I think you just need to change `str_replace` (replaces first match) to `str_replace_all` (replaces all matches). — Gregor Thomas, Nov 12 '20 at 04:53
Thankyou! I did read that thread however I couldn't see how it applied to multiple parenthesis in a single string - This is a good example of ```str_replace_all``` outside of standard tidyverse examples. — JMonk, Nov 12 '20 at 09:24

Karthik S · Answer 1 · 2020-11-12T05:36:13.810

Does this work:

st <- c('PanCK','PD-L1','CD8','FoxP3','PD-1','CD68')
test %>% 
mutate(Phenotype = str_remove_all(Class, '\\s\\(Opal [0-9]{3}\\)')) %>% 
mutate(Phenotype = str_remove_all(Phenotype, '(\\s)')) %>% 
mutate(Phenotype_Ordered = str_split(Phenotype, ':')) %>% unnest(Phenotype_Ordered) %>% 
group_by(Class) %>% arrange(factor(Phenotype_Ordered, levels = st)) %>% 
mutate(Phenotype_Ordered = paste(Phenotype_Ordered, collapse = ':')) %>% distinct()
# A tibble: 4 x 3
# Groups:   Class [4]
  Class                                                                Phenotype             Phenotype_Ordered    
  <chr>                                                                <chr>                 <chr>                
1 FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780) FoxP3:PanCK:PD-1:CD68 PanCK:FoxP3:PD-1:CD68
2 CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)  CD8:PanCK:CD68:PD-L1  PanCK:PD-L1:CD8:CD68 
3 PanCK (Opal 690): CD68 (Opal 780)                                    PanCK:CD68            PanCK:CD68           
4 FoxP3 (Opal 570): PanCK (Opal 690)                                   FoxP3:PanCK           PanCK:FoxP3

Brilliant! Why do the white spaces need to be removed for this to work with:```mutate(Phenotype = str_remove_all(Phenotype, '(\\s)'))``` — JMonk, Nov 12 '20 at 09:21
Regex with ```str_replace_all``` also works on the first line here: ```test %>% mutate(Phenotype = str_replace_all(Class, "\\([^()]{0,}\\)", ""))``` — JMonk, Nov 12 '20 at 09:25
@JamesMonkman, just went by your expected output regarding the white spaces. Yes, regex, in general, can by approached in different ways. — Karthik S, Nov 12 '20 at 09:27

score 1 · Accepted Answer · answered Nov 12 '20 at 10:09

Another trick is:

my_order <- c("CD68", "PD-1", "FoxP3", "CD8", "PD-L1", "PanCK")
test %>% 
  mutate(prototype = gsub('\\s*[(][^)]+[)]','',Class),
         ordered = map_chr(strsplit(prototype, '\\s*:\\s*'),
                      ~str_c(sort(ordered(.x,my_order), decreasing = TRUE), collapse = ":")))
                                                                 Class                prototype               ordered
1 FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780) FoxP3: PanCK: PD-1: CD68 PanCK:FoxP3:PD-1:CD68
2  CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)  CD8: PanCK: CD68: PD-L1  PanCK:PD-L1:CD8:CD68
3                                    PanCK (Opal 690): CD68 (Opal 780)              PanCK: CD68            PanCK:CD68
4                                   FoxP3 (Opal 570): PanCK (Opal 690)             FoxP3: PanCK           PanCK:FoxP3

Mutate to remove all parenthesis (and contents) from string in R

2 Answers2