Converting factor to numeric, with dots, thousands(K) and millions(M) abbreviation

Question

I'm trying to convert a column of money amount to numeric values. A very simplified version of my database would be:

SoccerPlayer = c("A","B","C","D","E")
Value = c("10K","25.5K","1M","1.2M","0")
database = data.frame(SoccerPlayer,Value)

I'm facing the currently issues. If there were no dots, and all money amount was at the same level of units such as only K(thousands) or only M(millions), this would work perfectly

library(stringi)
database$Value = as.numeric(gsub("K","000",database$Value))

But since there are K and M values in my data I'm trying to write it like this:

library(stringi)

if(stri_sub(database$Value,-1,-1) == 'M'){
  database$Value = gsub("M","000000",database$Value)
}

if(stri_sub(database$Value,-1,-1) == 'K'){
  database$Value = gsub("K","000",database$Value) 
}

as.numeric(database$Value)

Which reports the following warnings messages

Warning message:
In if (stri_sub(database$Value, -1, -1) == "M") { :
  the condition has length > 1 and only the first element will be used

Warning message:
In if (stri_sub(database$Value, -1, -1) == "K") { :
  the condition has length > 1 and only the first element will be used

Warning message:
NAs introduced by coercion

Looking the data after the procedure, it looks like this:

> print(database$Value)
[1] "10000"   "25.5000" "1M"      "1.2M"    "0"

Only the K(thousands) values were converted and I also have a problem on how to solve the dot issue like in "25.5000" or "1.2000000" (if the M conversion would have worked).

I'm new on programming and any help or thoughts on how to solve this would be much appreciated.

Related from a couple of weeks ago - https://stackoverflow.com/questions/56159114/converting-unit-abbreviations-to-numbers which also links to previous question - https://stackoverflow.com/questions/36806215/convert-from-k-to-thousand-1000-in-r — thelatemail, Jun 03 '19 at 01:20
The warning is because `if`/`else` structures aren't vectorized, so only the first value in a vector that gets passed there is checked for "M", "K", or anything else. Use a vectorized version like `ifelse` — camille, Jun 03 '19 at 03:02

score 0 · Answer 1 · answered Jun 03 '19 at 01:13

You can build a vector with the corresponding values of M and K (I use str_detect() for this but there are several ways to do it), use str_remove() to remove M and K from your initial Vector, and then transform Value as numeric and multiply with the created vector.

library(stringr)

Value_unity <- ifelse(str_detect(Value, 'M'), 1e6, ifelse(str_detect(Value, 'K'), 1e3, 1))

Value_new <- Value_unity * as.numeric(str_remove(Value, 'K|M'))

Converting factor to numeric, with dots, thousands(K) and millions(M) abbreviation

1 Answers1