6

Suppose I have the following data frame named DF. I would like to convert all the values in the Revenue column to the same unit.

Brands   Revenue
A        50.1 bn
B        41.2 bn
C        32.5 Mn
D        15.1 bn

Please note that bn and Mn are part of the vectors.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Sourabh Banik
  • 53
  • 1
  • 4

6 Answers6

8

One idea,

new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'bn',
              as.numeric(gsub('[A-Za-z]', '', DF$Revenue))*1000, DF$Revenue)

new[!grepl('Mn', new)] <- paste(new[!grepl('Mn', new)], 'Mn', sep = ' ')
DF$Revenue <- new

DF
#  Brands  Revenue
#1      A 50100 Mn
#2      B 41200 Mn
#3      C  32.5 Mn
#4      D 15100 Mn

To do the opposite then,

new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'Mn',
               as.numeric(gsub('[A-Za-z]', '', DF$Revenue))/1000, DF$Revenue)

 new[!grepl('bn', new)] <- paste(new[!grepl('bn', new)], 'bn', sep = ' ')
 DF$Revenue <- new
 DF
#  Brands   Revenue
#1      A   50.1 bn
#2      B   41.2 bn
#3      C 0.0325 bn
#4      D   15.1 bn
Sotos
  • 51,121
  • 6
  • 32
  • 66
4

Another method: separate the monetary value from the text using split:

# split value and "level" in a list
temp <- split(df$Revenue, split=" ")
# add separately to data.frame
df$Revenue <- sapply(temp, function(i) as.numeric(i[[1]]))
df$level <- sapply(temp, function(i) "[", 2)

df
  Brands Revenue level
1      A 50100.0    bn
2      B 41200.0    bn
3      C    32.5    bn
4      D 15100.0    bn

Now, convert to millions subsetting on the levels with "bn":

df$Revenue[df$level == "bn"] <- df$Revenue[df$level == "bn"] * 1000
df$level <- "Mn"

This results in

df
  Brands Revenue level
1      A  0.0501    Mn
2      B  0.0412    Mn
3      C 32.5000    Mn
4      D  0.0151    Mn

Instead convert to billions (a similar procedure)

df$Revenue[df$level == "Mn"] <- df$Revenue[df$level == "Mn"] / 1000
df$level <- "bn"

This results in

df
  Brands Revenue level
1      A  0.0501    bn
2      B  0.0412    bn
3      C 32.5000    bn
4      D  0.0151    bn
lmo
  • 37,904
  • 9
  • 56
  • 69
4

To maybe simplify the parsing procedure compared to the previous solutions. I am using the awesome library stringr:

library(stringr)

dd$units <- word(dd$Revenue, 2, sep = " ")
dd$amounts <- word(dd$Revenue, 1, sep = " ")


# The following lines create an extra column in the dataframe,
# You can overwrite the original column if you so wish.

# Convert to billions
dd$convert_to_bn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "bn", 1 , 0.001), "bn")

# Convert to millions
dd$convert_to_mn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "Mn", 1 , 1000), "Mn")
thepule
  • 1,721
  • 1
  • 12
  • 22
3

This is a solution that replaces the "units" by appropriate factors and evaluates the resulting calculations.

The first step is to replace "bn" and "Mn" by a factor:

conversion <- c(Mn = 1/1000, bn = 1)
for (unit in names(conversion)) {
  df$Revenue <- gsub(unit, paste0("*", conversion[unit]), df$Revenue)
}
df
##   CBrands             Revenue
## 1       A             50.1 *1
## 2       B             41.2 *1
## 3       C         32.5 *0.001
## 4       D             15.1 *1

Then evaluate the expressions in Revenue and "bn" again:

df$Revenue <- sapply(df$Revenue, function(x) eval(parse(text = x)))
df$Revenue <- paste(df$Revenue, "bn")
df
##   CBrands   Revenue
## 1       A   50.1 bn
## 2       B   41.2 bn
## 3       C 0.0325 bn
## 4       D   15.1 bn
Stibu
  • 15,166
  • 6
  • 57
  • 71
  • Although this works, it's worth noting that an `apply()` based solution will not scale as well as a vectorized solution as the number of rows increases, particularly when each iteration of the loop is calling `eval(parse(...))`. – C8H10N4O2 Aug 22 '17 at 13:33
2

We can also do this with gsubfn. Replace the 'bn', 'Mn' with * 1, * 1/1000, evaluate the string and paste with 'bn'.

library(gsubfn)
sprintf("%.2f bn", sapply(gsubfn("([[:alpha:]]+)", list(Mn = "* 1/1000", 
        bn = "* 1"), df1$Revenue), function(x) eval(parse(text=x))))
#[1] "50.10 bn" "41.20 bn" "0.03 bn"  "15.10 bn"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You should consider whether you really want to store revenue as text in your data. That will make it harder to do any sort of computations on revenue. You may find it better to store revenue as a float and write a custom format to display it with a suffix. The unit of measurement could be ones, billions, or anything else.

Depending on what you decide, here are two approaches using the popular "tidyverse" packages. In both approaches, use tidyr's separate(..., sep='\\s',convert=TRUE) to split your revenue text into a number and a unit.

Keeping revenue stored as text (not best practice)

library(tidyr)
library(dplyr)
DF %>% 
  separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
  mutate( Rev.Amt = Rev.Amt/ifelse(Rev.Denom=='Mn',1000,1), # other conversions as needed
          Rev.Denom = 'bn' ) %>%
  unite(Revenue, Rev.Amt, Rev.Denom, sep=' ')
# Brands   Revenue
# 1      A   50.1 bn
# 2      B   41.2 bn
# 3      C 0.0325 bn
# 4      D   15.1 bn

Store revenue as numeric (better practice)

Here we'll store revenue as units without a multiple, but you could also store it as billions and avoid the step of division upon display.

DF %<>% 
  separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
  mutate( Rev.Amt = Rev.Amt*ifelse(Rev.Denom=='Mn', 1e6, 1e9)) %>% # other conv as needed
  select(-Rev.Denom)

# To display numeric revenue in billions
showInBn <- function(x) paste(x/1e9,'bn')
DF %>% mutate(Rev.Expr = showInBn(Rev.Amt)) %>% select(-Rev.Amt)
#   Brands  Rev.Expr
# 1      A   50.1 bn
# 2      B   41.2 bn
# 3      C 0.0325 bn
# 4      D   15.1 bn
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134