1

I have a dataset with a "value (in millions of USD)" column that I want to manipulate. Entries are strings in different formats - either with a dollar sign and followed by an M, e.g. "$1.3M," or followed by a K, e.g. "$450K," or some that I've already turned into proper numerical entries (e.g. 40 for 40 million USD).

I want to: get rid of the $ and extract only the numerical value for each row in millions.

Probably looking at some kind of column splitter based on values containing M or K, with an "ifelse" resembling something like: ifelse(PL$'VALUE (M)' contains M, extract.numeric from PL$'VALUE (M)', PL$'VALUE (M)' * 10^-3).

Haven't quite figured out the easiest way to do this on R though. Help would be appreciated!

  • use regular expression to extract then numeric part, https://stackoverflow.com/questions/15451251/extract-numeric-part-of-strings-of-mixed-numbers-and-characters-in-r – Kaps Nov 18 '17 at 16:18

1 Answers1

2

You can use gsubfn to specify how to match the currency to numeric.

x <- c("$1.3M", "$450K")

library(gsubfn)
as.numeric(
 gsubfn( "\\D",  list( "$"="", "M" = "e6", "K" = "e3"), x)
)
#1300000  450000
user2957945
  • 2,353
  • 2
  • 21
  • 40