Can you change "1.00K" to "1,000" or "1.00M" to "1,000,000" in r? Currently listed as a character string.
-
2https://stackoverflow.com/questions/71157381/thousand-separator-to-numeric-columns-in-r/71172863#71172863??? – Onyambu Feb 20 '22 at 19:20
-
@Onyambu It looks like that is the reverse of what I am trying to do. – Leah Feb 20 '22 at 19:23
-
Do you need your results to be numeric or character? – Onyambu Feb 20 '22 at 19:25
-
Also how do you end up with `M` and `K`?? are you in any way formating from numeric to character that has M and K and then you want to revert back? – Onyambu Feb 20 '22 at 19:27
-
@Onyambu numeric please. We need it for comparisons. Ex.: makes it easier to compare "1.00M" to "1.00K". – Leah Feb 20 '22 at 19:28
-
@Onyambu The M and the K were provided in the dataset. – Leah Feb 20 '22 at 19:28
-
Welcome to SO. While your question seems clear in what you are asking, it is generally good practice to provide a reproducible example (reprex). Also, without constructing a wall of text, it is helpful to disclose what approaches you tried. – ncraig Feb 20 '22 at 19:37
4 Answers
If you need the result as numeric, you could do it with regular expressions:
numbers <- c("5.00K", "1.00M", "100", "3.453M")
as.numeric(sub("^(\\d+\\.?\\d*).*$", "\\1", numbers)) *
ifelse(grepl("K", numbers), 1000, 1) *
ifelse(grepl("M", numbers), 1e6, 1)
#> [1] 5000 1000000 100 3453000

- 147,086
- 7
- 49
- 87
We may also do this by replacing the 'K', 'M' with e3
and e4
respectively using str_replace
and then directly convert to numeric
library(stringr)
as.numeric(str_replace_all(str1, setNames(c("e3", "e6"), c("K", "M"))))
[1] 5000 1000000 100 3453000
data
str1 <- c("5.00K", "1.00M", "100", "3.453M")

- 874,273
- 37
- 540
- 662
Here is another approach:
x <- "1.00K"
format(as.numeric(sub("K", "e3", x, fixed = TRUE)), big.mark = ",")
[1] "1,000"
options(scipen = 100)
y <- "1.00M"
format(as.numeric(sub("M", "e6", y, fixed = TRUE)), big.mark=",")
[1] "1,000,000"
- Explanation:
sub("K", "e3", x, fixed = TRUE)
gives
"1.00e3"
(e.g.: K
is replaced by e3
)
and adding as.numeric(..)
:
as.numeric("1.00e3")
gives
1000
and
wraping it around format(..., bigmark=","):
format(as.numeric(sub("K", "e3", x, fixed = TRUE)), big.mark = ",")
gives
1,000
- Now same procedure for
M
but here we neede6

- 72,363
- 6
- 19
- 66
-
I can see that the `e3` and `e6` values are controlling where the decimal point lands. However, while I can manipulate it (which is very useful) I'm afraid that don't understand it. Is this a value for the `replacement` argument of `sub`? I'm obviously not searching the documentation correctly. – ncraig Feb 20 '22 at 20:59
-
1
The stringr
library should address this. Try the following:
# load library
library(stringr)
# construct a vector requiring the change
foo <- c("1.00K", "bar")
foo
# replace values
foo <- str_replace_all(foo, pattern = "1.00K", replacement = "1,000")
foo
To make the other changes, like converting "1.00M" to "1,000,000", simply alter the value for the replacement =
argument. When cleaning data, I often assemble all of these cleaning steps in a separate R script that gets called early in my R Markdown document.

- 783
- 1
- 10
- 23
-
1This wouldn't work in the general case of a column of arbitrary numbers like `c("1.00K", "2.01K", "3.78M")` etc – Allan Cameron Feb 20 '22 at 19:42
-
I suppose you are correct as one would need a `str_replace_all` call for each unique number. Perhaps with regex's if the number ends with K remove the decimal and tack on three 0's, etc? Maybe that would be a more generally applicable approach for a long column of numbers like this. – ncraig Feb 20 '22 at 19:48
-