How to Find, Replace, and Multiply in a table

Question

I have a data table from REGIS that includes in random columns, numeric amounts that are summarized by "$1.11 M"(as an example) instead of 1,110,000.00 as a number. In these random columns, it could either show as a "M" or "B" or "K" ( for Millions, Billions, or Thousands).

I am trying to figure out a code that will find and remove the "$" and "M", then multiply the numeric number by 1,000,000 (or whatever the the dollar amount should be).

I have tried using lapply, if_else, Gsub... but I can't figure out how to only make these changes to the cells that have the characters "$" and "M" (or "B"," K")

So far i have:

df1$1<-sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), 
    sub("$", "",df1$1, fixed = TRUE)), function(x) eval(parse(text = x)))

The problem is that I am looking for a way to make this code cover all of my columns at once.. just not a specific column.

I expect to change (for example) "$1.65 K" to be 1,650.00 or "3.96 M" to 3,960,000.00

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, May 15 '19 at 14:51
Also [this question](https://stackoverflow.com/questions/38013217/convert-from-billion-to-million-and-vice-versa) might help — Sotos, May 15 '19 at 14:52

akrun · Accepted Answer · 2019-05-16T15:08:44.053

0

We can use gsubfn to replace the 'M', 'K' values with a multiplier string by passing a key/val list, then evaluate the string and change it to custom format with dollar_format from scales

library(gsubfn)
library(scales)
out <- sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6"), 
    sub("$", "", str1, fixed = TRUE)), function(x) eval(parse(text = x)))
dollar_format()(out)
#[1] "$1,650"     "$3,960,000"

If we need to change for all the columns

df1[] <-lapply(df1, function(x)  dollar_format()(sapply(gsubfn("[A-Z]",
        list(K = "*1000", M = "*1e6", B = "*1e9"), 
    sub("$", "", x, fixed = TRUE)), function(y) eval(parse(text = y)))))

Update

OP came up with some more patterns. For that, we can do

df2[] <- lapply(df2, function(x) {
     i1 <- grepl("[KMB]", x) # create index to change only those have KMB 
     x[i1] <- dollar_format()(
       sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"),
          sub("$", "", x[i1], fixed = TRUE)),
        function(y) eval(parse(text = y))))
      x
     })
df2
#        col1       col2
#1     $1,650     $1,650
#2 $3,960,000 $3,960,000
#3    $-1,650    $-1,650
#4       -950       -950

data

str1 <- c("$1.65 K" ,  "3.96 M")
df1 <- data.frame(col1 = str1, col2 = str1, stringsAsFactors = FALSE)

str2 <- c("$1.65 K" ,  "3.96 M", "-$1.65 K", "-950")
df2 <- data.frame(col1 = str2, col2 = str2, stringsAsFactors = FALSE)

edited May 16 '19 at 15:08

answered May 15 '19 at 14:55

akrun

874,273
37
540
662

If the table I am using is called "DF", where would that fit into the script you provided? Sorry i am pretty new to R. – J. Shields May 15 '19 at 15:00
Am i able to use this code when dealing with a data table instead of a string of values? – J. Shields May 15 '19 at 15:05
@J.Shields Yes you can use `DF$columnname` – akrun May 15 '19 at 15:23
First off, the code works great! Is there a way I can get it to check all of my columns at once? the table has 374 columns.. all of which could have the formatting that I am trying to change. – J. Shields May 15 '19 at 15:30
@J.Shields Yes, if it is a data.frame `df1[] <- lapply(df1, function(x) ...)` – akrun May 15 '19 at 15:31
Adding your above comment to the full code... is this correct?df1[]<-lapply(df1, function(x) sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), + sub("$", "", test_main_df, fixed = TRUE)), function(x) eval(parse(text = x)))) – J. Shields May 15 '19 at 15:50
I think the `sub("$","", x, fixed = TRUE)` where 'x' is the anonymous function call for the column element. In the second `function(x)`, it is for the `sapply` loop. I would name it as `y` instead of 'x' to avoid confusion – akrun May 15 '19 at 15:54
So this is what should work? I tried running and it doesn't look to be working.. df1[]<-lapply(df1, function(x) sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), sub("$", "", df1, fixed = TRUE)), function(y) eval(parse(text = y)))) – J. Shields May 15 '19 at 16:10
@J.Shields In the `sub` call, you were using the whole data 'df1', instead of 'x'. Try `lapply(df1, function(x) dollar_format()(sapply(gsubfn("[A-Z]", list(K = "*1000", M = "*1e6", B = "*1e9"), sub("$", "", x, fixed = TRUE)), function(y) eval(parse(text = y)))))` – akrun May 15 '19 at 18:06
Thanks so much! this is great. The (hopefully) last issue that i have is that there are negative numbers in the table, so when i run the script i get this error: `Error in parse(text = y) : :2:0: unexpected end of input 1: - ^ ` Is there a way it can ignore the "-"? – J. Shields May 16 '19 at 12:57
@J.Shields Can you show an example. Is it like `-$1.65 K"` if you want to ignore, then do `{i1 <- !grepl("^-", x); x1 <- x[i1];` and then update the elements that don't have the `-` – akrun May 16 '19 at 13:51
So I found instances where there are numbers like "-$1.65 k" that I would like changed to "-1,650.00". But at the same time I found amounts that are "-950", and i would like that to stay the way it is. With your code above, where would i enter that into the script? Do i need a closing "}" added to it as well? – J. Shields May 16 '19 at 14:06

How to Find, Replace, and Multiply in a table

1 Answers1

Update

data