-1

I want to convert all the values to billion in dataframe and then calculate its mean.

DF1 <- data.frame("Brand"=c("a","b","c","d","e","f"),"Revenue"=c("$50.21 M","$20.31 B","$50.23 M","$41.45 B","$29.10 M","$32.21 M"))
show(DF1)

temp<-as.numeric(gsub("^[[:punct:]]", "",DF1$Revenue))
temp

temp_num<- as.numeric(as.character(DF1$Revenue))

Warning message: NAs introduced by coercion

Emma
  • 27,428
  • 11
  • 44
  • 69

2 Answers2

1

An option is to extract the numeric part with parse_number, convert the values to billion by dividing the values that have "M" suffix with 1000 and get the mean of the 'Revenue` column

library(dplyr)
library(stringr)
DF1 %>% 
  mutate(Revenue = readr::parse_number(as.character(Revenue)) * 
          c(1, 1/1e3)[str_detect(Revenue, "M") + 1]) %>%
  summarise(Mean = mean(Revenue))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

After your gsub(), the column could be split with strsplit(). Then we can convert the one column into a factor and translate "M" and "B" into numerical labels. Now we coerce to numeric, calculate the products, and finally may easily calculate the mean divided by 1e9 (using American billions).

s <- do.call(rbind.data.frame, strsplit(gsub("^[[:punct:]]", "", DF1$Revenue), " "))
s[, 2] <- factor(s[, 2], labels=c("1e9", "1e6"))
res <- mean(apply(s, 1, function(x) prod(as.numeric(as.character(x)))))/1e9
# [1] 10.32029

For a final output we may use formatC.

formatC(res, format="f", big.mark=",", digits=2)
# [1] "10.32"
jay.sf
  • 60,139
  • 8
  • 53
  • 110