R packages that parse number from english

Question

Are there any packages in R that can "understand" number from English, for example:

"50 million" -> 50,000,000
"$17.9M" -> 17,900,000

It doesn't have to handle all possible cases, but I want to see how people tackle this problem and I can learn from their code and write my own solution.

You could reverse engineer some of the ideas from [this question](http://stackoverflow.com/questions/28159936/formatting-large-currency-or-dollar-values-to-millions-billions) — Rich Scriven, Jul 24 '15 at 04:56
google (and wolfram alpha) are getting pretty good at those things, I'd try to look for an API — baptiste, Jul 24 '15 at 05:14
A similar question has also been discussed [here](http://stackoverflow.com/questions/11340444/is-there-an-r-function-to-format-number-using-unit-prefix) — RHertel, Jul 24 '15 at 05:21

score 2 · Accepted Answer · answered Jul 24 '15 at 05:08

This is how I would approach it.

library(stringr)
m <- your_vector
m <- tolower(m) # normalize strings
m <- gsub(",","",m) # drop punctuation
m <- gsub("$","",m) # other punctuation as necessary
m <- gsub("\\s","",m) # drop spaces

dat <- data.frame(raw = m)
dat$words <- str_extract(m,"[a-z].*") # extract words
dat$numbers <- str_extract(m,"[0-9]*") # extract numbers

Then create a new data.frame from unique(dat$words), merge, and multiply.

dat_merge <- data.frame(
   words = unique(dat$words), 
   multiplier = c(1e6,1e6) # from LOOKING at unique(dat$words)
) # new df

dat <- merge(dat, dat_merge)
dat$value <- dat$multiplier * dat$numbers

dat$value

I particularly like this approach, because you can easily update it over time. Especially when you have new formats. I use it personally in a lot of projects for verbatim company names, and some other small text elements.

R packages that parse number from english

1 Answers1