3

Can you change "1.00K" to "1,000" or "1.00M" to "1,000,000" in r? Currently listed as a character string.

Leah
  • 43
  • 5
  • 2
    https://stackoverflow.com/questions/71157381/thousand-separator-to-numeric-columns-in-r/71172863#71172863??? – Onyambu Feb 20 '22 at 19:20
  • @Onyambu It looks like that is the reverse of what I am trying to do. – Leah Feb 20 '22 at 19:23
  • Do you need your results to be numeric or character? – Onyambu Feb 20 '22 at 19:25
  • Also how do you end up with `M` and `K`?? are you in any way formating from numeric to character that has M and K and then you want to revert back? – Onyambu Feb 20 '22 at 19:27
  • @Onyambu numeric please. We need it for comparisons. Ex.: makes it easier to compare "1.00M" to "1.00K". – Leah Feb 20 '22 at 19:28
  • @Onyambu The M and the K were provided in the dataset. – Leah Feb 20 '22 at 19:28
  • Welcome to SO. While your question seems clear in what you are asking, it is generally good practice to provide a reproducible example (reprex). Also, without constructing a wall of text, it is helpful to disclose what approaches you tried. – ncraig Feb 20 '22 at 19:37

4 Answers4

4

If you need the result as numeric, you could do it with regular expressions:

numbers <- c("5.00K", "1.00M", "100", "3.453M")

as.numeric(sub("^(\\d+\\.?\\d*).*$", "\\1", numbers)) *
  ifelse(grepl("K", numbers), 1000, 1) * 
  ifelse(grepl("M", numbers), 1e6, 1)
#> [1]    5000 1000000     100 3453000
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
4

We may also do this by replacing the 'K', 'M' with e3 and e4 respectively using str_replace and then directly convert to numeric

library(stringr)
as.numeric(str_replace_all(str1, setNames(c("e3", "e6"), c("K", "M"))))
[1]    5000 1000000     100 3453000

data

str1 <- c("5.00K", "1.00M", "100", "3.453M")
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is another approach:

x <- "1.00K"
format(as.numeric(sub("K", "e3", x, fixed = TRUE)), big.mark = ",")
[1] "1,000"

options(scipen = 100)
y <- "1.00M"
format(as.numeric(sub("M", "e6", y, fixed = TRUE)), big.mark=",")
[1] "1,000,000"
  • Explanation:

sub("K", "e3", x, fixed = TRUE) gives "1.00e3" (e.g.: K is replaced by e3)

and adding as.numeric(..):

as.numeric("1.00e3") gives 1000

and

wraping it around format(..., bigmark=","):

format(as.numeric(sub("K", "e3", x, fixed = TRUE)), big.mark = ",") gives 1,000

  • Now same procedure for M but here we need e6
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • I can see that the `e3` and `e6` values are controlling where the decimal point lands. However, while I can manipulate it (which is very useful) I'm afraid that don't understand it. Is this a value for the `replacement` argument of `sub`? I'm obviously not searching the documentation correctly. – ncraig Feb 20 '22 at 20:59
  • 1
    Please see my update with the explanation. – TarJae Feb 21 '22 at 06:29
0

The stringr library should address this. Try the following:

# load library
library(stringr)

# construct a vector requiring the change
foo <-  c("1.00K", "bar")
foo

# replace values
foo <- str_replace_all(foo, pattern = "1.00K", replacement = "1,000")
foo

To make the other changes, like converting "1.00M" to "1,000,000", simply alter the value for the replacement = argument. When cleaning data, I often assemble all of these cleaning steps in a separate R script that gets called early in my R Markdown document.

ncraig
  • 783
  • 1
  • 10
  • 23
  • 1
    This wouldn't work in the general case of a column of arbitrary numbers like `c("1.00K", "2.01K", "3.78M")` etc – Allan Cameron Feb 20 '22 at 19:42
  • I suppose you are correct as one would need a `str_replace_all` call for each unique number. Perhaps with regex's if the number ends with K remove the decimal and tack on three 0's, etc? Maybe that would be a more generally applicable approach for a long column of numbers like this. – ncraig Feb 20 '22 at 19:48
  • I see your answer does just this. Taking note...thanks. – ncraig Feb 20 '22 at 19:49