1

I am working on analysing some data using wage variables. The variable contains symbol '€' and 'M' or 'K'.

I was trying to use gsub() function to address this issue, yet my code doesn't work

Integer_converter <- function(strWage) { 
  Factor_Wage = gsub("€", " ", strWage)
}

Factor_converter_1 <- function(strWage) {
  Integer_Wage = gsub("M", " ", strWage)
}

Factor_converter_2 <- function(strWage) {
  Integer_wage = as.integer(as.integer(gsub("K", "", strWage)) / 100) 
}

The actual values are listed as follows:

$ Wage /fct/ €405K, €195K, €205K, €240K, €175K, €25K, €205K, €57K, €140K, €135K, €15K, €45K, €40K, €76K, €17K, €125K, …

and I want to convert it into

$ Wage /int/ 0.405, 0.195, 0.205, 0.240, 0.175, 0.025, 0.205, 0.057, 0.140, 0.135, 0.015, 0.045, 0.040, 0.076, 0.017, 0.125, …enter image description here

Girim Ban
  • 21
  • 3
  • In my world K == 1000 – IRTFM Mar 31 '19 at 20:59
  • I nominated for closing but only later realized that the "duplicate" was making the converse coercion: https://stackoverflow.com/questions/28159936/formatting-large-currency-or-dollar-values-to-millions-billions . I rather suspect there will be a duplicate but having failed at my first attempt, SO will not let me make another dupe nomination. – IRTFM Mar 31 '19 at 21:07

1 Answers1

2

We can use parse_number from readr to extract the number and divide by 1000.

library(readr)
parse_number(as.character(df1$Wage))/1000
#[1] 0.405 0.195 0.205 0.240 0.175 0.025 0.205 0.057 0.140 
#[10] 0.135 0.015 0.045 0.040 0.076 0.017 0.125

It extracts the numeric part and then just divide by 1000


It can also be done with tidyverse chain

library(dplyr)
df1 %>%
   mutate(Wage = parse_number(as.character(Wage))/1000)

If there are "M" in addition to "K", we can use gsubfn

library(gsubfn)
unname(sapply(gsubfn("[A-Z]", list(K = '/1e3', M = '/1e6'), 
       sub("€", "", df2$Wage)), function(x) eval(parse(text = x))))

data

df1 <- data.frame(Wage = c("€405K", "€195K", "€205K", "€240K", "€175K",
  "€25K", "€205K", "€57K",  "€140K", "€135K", "€15K", "€45K",
     "€40K", "€76K", "€17K", "€125K"))

df2 <- data.frame(Wage = c("€405K", "€195K", "€205K", "€240K", "€175K",
  "€25K", "€205K", "€57K",  "€140K", "€135K", "€15M", "€45K",
     "€40K", "€76K", "€17M", "€125K"))
akrun
  • 874,273
  • 37
  • 540
  • 662