1

I have a data table from REGIS that includes in random columns, numeric amounts that are summarized by "$1.11 M"(as an example) instead of 1,110,000.00 as a number. In these random columns, it could either show as a "M" or "B" or "K" ( for Millions, Billions, or Thousands).

I am trying to figure out a code that will find and remove the "$" and "M", then multiply the numeric number by 1,000,000 (or whatever the the dollar amount should be).

I have tried using lapply, if_else, Gsub... but I can't figure out how to only make these changes to the cells that have the characters "$" and "M" (or "B"," K")

So far i have:

df1$1<-sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), 
    sub("$", "",df1$1, fixed = TRUE)), function(x) eval(parse(text = x)))

The problem is that I am looking for a way to make this code cover all of my columns at once.. just not a specific column.

I expect to change (for example) "$1.65 K" to be 1,650.00 or "3.96 M" to 3,960,000.00

bbiasi
  • 1,549
  • 2
  • 15
  • 31
J. Shields
  • 33
  • 5
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos May 15 '19 at 14:51
  • Also [this question](https://stackoverflow.com/questions/38013217/convert-from-billion-to-million-and-vice-versa) might help – Sotos May 15 '19 at 14:52

1 Answers1

0

We can use gsubfn to replace the 'M', 'K' values with a multiplier string by passing a key/val list, then evaluate the string and change it to custom format with dollar_format from scales

library(gsubfn)
library(scales)
out <- sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6"), 
    sub("$", "", str1, fixed = TRUE)), function(x) eval(parse(text = x)))
dollar_format()(out)
#[1] "$1,650"     "$3,960,000"

If we need to change for all the columns

df1[] <-lapply(df1, function(x)  dollar_format()(sapply(gsubfn("[A-Z]",
        list(K = "*1000", M = "*1e6", B = "*1e9"), 
    sub("$", "", x, fixed = TRUE)), function(y) eval(parse(text = y)))))

Update

OP came up with some more patterns. For that, we can do

df2[] <- lapply(df2, function(x) {
     i1 <- grepl("[KMB]", x) # create index to change only those have KMB 
     x[i1] <- dollar_format()(
       sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"),
          sub("$", "", x[i1], fixed = TRUE)),
        function(y) eval(parse(text = y))))
      x
     })
df2
#        col1       col2
#1     $1,650     $1,650
#2 $3,960,000 $3,960,000
#3    $-1,650    $-1,650
#4       -950       -950

data

str1 <- c("$1.65 K" ,  "3.96 M")
df1 <- data.frame(col1 = str1, col2 = str1, stringsAsFactors = FALSE)

str2 <- c("$1.65 K" ,  "3.96 M", "-$1.65 K", "-950")
df2 <- data.frame(col1 = str2, col2 = str2, stringsAsFactors = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • If the table I am using is called "DF", where would that fit into the script you provided? Sorry i am pretty new to R. – J. Shields May 15 '19 at 15:00
  • Am i able to use this code when dealing with a data table instead of a string of values? – J. Shields May 15 '19 at 15:05
  • @J.Shields Yes you can use `DF$columnname` – akrun May 15 '19 at 15:23
  • First off, the code works great! Is there a way I can get it to check all of my columns at once? the table has 374 columns.. all of which could have the formatting that I am trying to change. – J. Shields May 15 '19 at 15:30
  • @J.Shields Yes, if it is a data.frame `df1[] <- lapply(df1, function(x) ...)` – akrun May 15 '19 at 15:31
  • Adding your above comment to the full code... is this correct?df1[]<-lapply(df1, function(x) sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), + sub("$", "", test_main_df, fixed = TRUE)), function(x) eval(parse(text = x)))) – J. Shields May 15 '19 at 15:50
  • I think the `sub("$","", x, fixed = TRUE)` where 'x' is the anonymous function call for the column element. In the second `function(x)`, it is for the `sapply` loop. I would name it as `y` instead of 'x' to avoid confusion – akrun May 15 '19 at 15:54
  • So this is what should work? I tried running and it doesn't look to be working.. df1[]<-lapply(df1, function(x) sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), sub("$", "", df1, fixed = TRUE)), function(y) eval(parse(text = y)))) – J. Shields May 15 '19 at 16:10
  • @J.Shields In the `sub` call, you were using the whole data 'df1', instead of 'x'. Try `lapply(df1, function(x) dollar_format()(sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), sub("$", "", x, fixed = TRUE)), function(y) eval(parse(text = y)))))` – akrun May 15 '19 at 18:06
  • Thanks so much! this is great. The (hopefully) last issue that i have is that there are negative numbers in the table, so when i run the script i get this error: `Error in parse(text = y) : :2:0: unexpected end of input 1: - ^ ` Is there a way it can ignore the "-"? – J. Shields May 16 '19 at 12:57
  • @J.Shields Can you show an example. Is it like `-$1.65 K"` if you want to ignore, then do `{i1 <- !grepl("^-", x); x1 <- x[i1];` and then update the elements that don't have the `-` – akrun May 16 '19 at 13:51
  • So I found instances where there are numbers like "-$1.65 k" that I would like changed to "-1,650.00". But at the same time I found amounts that are "-950", and i would like that to stay the way it is. With your code above, where would i enter that into the script? Do i need a closing "}" added to it as well? – J. Shields May 16 '19 at 14:06