I have a bunch of strings in a column of the data frame. There are multiple typos in the strings so I want to replace them in one line of code. The data frame is shown below:
comment <- c("3.3% 1AT $100000 AND 1.2% BALANCE", "3.3% 1DT $100000 AND 1.2% BALANCE" ,
+ "3.3% 1S $100000 AND 1.2% BALANCE", "3.3% 1SST $100000 AND 1.2% BALANCE", "3.3% 1S T$100000 AND 1.2% BALANCE")
df <- data.frame(comment)
comment
1 3.3% 1AT $100000 AND 1.2% BALANCE
2 3.3% 1DT $100000 AND 1.2% BALANCE
3 3.3% 1S $100000 AND 1.2% BALANCE
4 3.3% 1SST $100000 AND 1.2% BALANCE
5 3.3% 1S T$100000 AND 1.2% BALANCE
Basically, what I want to do is to replace 1AT, 1DT, 1S, 1SST with 1ST.
I followed this link Can I use an OR in regex without capturing what's enclosed? and wrote the code below:
gsub("^(\\d\\.\\d%) (?: 1AT|1DT|1S|1SST) (\\$100000) AND (\\d\\.\\d%) BALANCE","\\1 1ST \\2 AND \\3 BALANCE", df$comment)
I understand that this code wouldn't change "1S T$100000" to the right formate, but I expect it to work on other rows. However, this code didn't change the string. Is there any problem with it?