5

my challenge is to convert ten and one which is in words to numbers as 10 and 1 in the input sentence:

example_input <- paste0("I have ten apple and one orange")

Numbers may change based on user requirement, input sentence can be tokenized:

my_output_toget<-paste("I have 10 apple and 1 orange")
s__
  • 9,270
  • 3
  • 27
  • 45
  • Possible duplicate of [Convert integer to words](https://stackoverflow.com/questions/46652066/convert-integer-to-words) – Newl May 07 '19 at 09:05

4 Answers4

6

We can pass a key/val pair as replacement in gsubfn to replace those words with numbers

library(english)
library(gsubfn)
gsubfn("\\w+", setNames(as.list(1:10), as.english(1:10)), example_input)
#[1] "I have 10 apple and 1 orange"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    @SachinHegde Here, I assumed that you have values from `1:10`, if it is more then use `as.english(1:1000)` in the `setNames` i.e. `setNames(as.list(1:1000, as.english(1:1000))` – akrun May 07 '19 at 11:37
  • 1
    @SachinHegde Try `v1 <- c("I have one thousand apples", "I have one apple"); l1 <- setNames(list(1, 1000), as.english(c(1, 1000)));gsubfn("\\w+", l1, gsubfn("\\w+ \\w+", l1, v1))# [1] "I have 1000 apples" "I have 1 apple" ` – akrun May 07 '19 at 11:57
3

textclean is quite a handy possibility for this task:

mgsub(example_input, replace_number(seq_len(10)), seq_len(10))

[1] "I have 10 apple and 1 orange"

You just need to adjust the seq_len() parameter according to the maximum number in your data.

Some examples:

example_input <- c("I have one hundred apple and one orange")

mgsub(example_input, replace_number(seq_len(100)), seq_len(100))

[1] "I have 100 apple and 1 orange"

example_input <- c("I have one tousand apple and one orange")

mgsub(example_input, replace_number(seq_len(1000)), seq_len(1000))

[1] "I have 1 tousand apple and 1 orange"

If you don't know your maximum number beforehand, you can just choose a sufficiently big number.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
3

I wrote an R package to do this - https://github.com/fsingletonthorn/words_to_numbers which should work for more use cases.

devtools::install_github("fsingletonthorn/words_to_numbers")

library(wordstonumbers)

example_input <- "I have ten apple and one orange"

words_to_numbers(example)

[1] "I have 10 apple and 1 orange"

It also works for much more complex cases like


words_to_numbers("The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible four-hundred and ten page books made with a character set of twenty five characters (twenty two letters, as well as spaces, periods, and commas), with eighty lines per book and forty characters per line.")
#> [1] "The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible 410 page books made with a character set of 25 characters (22 letters, as well as spaces, periods, and commas), with 80 lines per book and 40 characters per line."

Or

words_to_numbers("300 billion, 2 hundred and 79 cats")
#> [1] "300000000279 cats"
FelixST
  • 303
  • 2
  • 8
1

Less elegantly than Akrun's answer but in base.

nums = c("one","two","three","four","five",
         "six","seven","eight","nine","ten")
example_input <- paste0("I have ten apple and one orange")

aux = strsplit(example_input," ")[[1]]
aux[!is.na(match(aux,nums))]=na.omit(match(aux,nums))
example_output = paste(aux,collapse=" ")
example_output
[1] "I have 10 apple and 1 orange"

We first split by spaces, find the matching numbers and change them by the position (coincides with the number itself), then paste it again.

boski
  • 2,437
  • 1
  • 14
  • 30