I have a dataset and am trying to clean it step by step. One of the challenges I have is that people write their comments in different ways, so sometimes they show a number in a form of $100K while sometimes they show it in a form of $100000. I am wondering how I can change K to 1000, so instead of $43K we have $43000. The sample data is shown below:
structure(list(comment3 = c("3.22%-1ST $100K/1.15% BAL", "3.25% ON 1ST $100000/1.16% ON BAL",
"3.22% 1ST 100K/1.16 ON BAL", "3.22% 1ST 100K/1.15% ON BAL",
"3.26% 1ST 100K/1.16% ON BAL", "3.20% 1ST 100K/1.15% ON BAL",
"3.22% ON 1ST 100K & 1.15% ON BALANCE")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
1 3.22%-1ST $100K/1.15% BAL
2 3.25% ON 1ST $100000/1.16% ON BAL
3 3.22% 1ST 100K/1.16 ON BAL
4 3.22% 1ST 100K/1.15% ON BAL
5 3.26% 1ST 100K/1.16% ON BAL
6 3.20% 1ST 100K/1.15% ON BAL
7 3.22% ON 1ST 100K & 1.15% ON BALANCE
I tried to use the approach explained here Convert from K to thousand (1000) in R However, I wasn't successful. Here is my code:
as.numeric(sub("\\d\\d\\d[K]","Ke3", data$comment3, fixed=TRUE))
I was hoping that by putting K in [], I can somehow separate it and then used the separated K and multiple it by 1000 which didn't work.
The problem is that since I have a combination of text and number, first I have to select the number and then multiply it by 1000 and replace it which I don't know how to do it. I also have a non-efficient method which is working right now:
bb <- str_match(df_com$comment3, pattern = "\\d\\d\\dK")
table(bb)
by doing this I found that I only have cases like 100K 350K and 110K, so then I replaced these numbers with 100000, 350000 and 110000, but this method is not efficient and kind of stupid! Any comment on how to fix this?