0

I would like to use stringr and rebus to remove parts of strings in a dataframe. Specifically, I would like to remove the part where it starts with a space and a number till the end.

The following is my dataframe:

df<-data.frame(ID = 1:8, Medication = c("FOLIC ACID 5MG TABLET", "RIBAVIRIN 200MG TAB", "ACARBOSE 50MG TABLET", 
                                        "AmLODIPine 5MG TABLET", "MAGNESIUM TRISILICATE MIXTURE 200ML", 
                                        "RESONIUM 15G/60ML SUSPENSION", "CALCIUM & VIT D TABLET", NA))

My desired dataframe is:

df_new<-data.frame(ID = 1:8, Medication = c("FOLIC ACID", "RIBAVIRIN", "ACARBOSE", 
                                            "AmLODIPine", "MAGNESIUM TRISILICATE MIXTURE", 
                                            "RESONIUM", "CALCIUM & VIT D TABLET", NA))

I tried the following code but it only helps to remove the drug strength (e.g. 5MG) not the unit of measurement (e.g. TABLET):

df %>% mutate(Medication = str_replace(Medication, pattern = SPC %R% 
                                         one_or_more(DGT) %R% 
                                         one_or_more(WRD) %R%
                                         or(one_or_more(SPC), one_or_more(WRD)), 
                                       replace = ""))

How can I work on this?

mookid8000
  • 18,258
  • 2
  • 39
  • 63
HNSKD
  • 1,614
  • 2
  • 14
  • 25

1 Answers1

1
  transform(df,Medication=sub("\\s\\d.*","",df$Medication))
  ID                    Medication
1  1                    FOLIC ACID
2  2                     RIBAVIRIN
3  3                      ACARBOSE
4  4                    AmLODIPine
5  5 MAGNESIUM TRISILICATE MIXTURE
6  6                      RESONIUM
7  7        CALCIUM & VIT D TABLET
8  8                          <NA>
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • May I ask what is the difference between `[:digit:]` and `\d`? – HNSKD Feb 02 '18 at 07:47
  • No difference in English as far as I know. – Onyambu Feb 02 '18 at 07:52
  • Although there is a difference between `[0-9]` and `\d` which rises because `[0-9]` only matches `0-9` while `\d` will match other numbers from roman, Hebrew etc.. – Onyambu Feb 02 '18 at 07:55
  • you can [click here](https://stackoverflow.com/questions/16621738/d-is-less-efficient-than-0-9) for more information – Onyambu Feb 02 '18 at 07:57