I am trying to extract spelled-out numbers from strings, plus extracting the word that comes after the number. I have managed to do this by a laboursome way of writing my own code including the spelled-out numbers to search for (here an example from stringr::sentences
:
numbers <- str_c(c(" one ", " two ", " three ", " four ", " five ", " six ", " seven ", " eight "," nine ", " ten "), "([^ ]+)")
number_match <- str_c(numbers, collapse = "|")
reduced <- sentences %>%
str_detect(number_match)
sent <- sentences[reduced==TRUE]
str_extract(sent, number_match)
These are the extracted strings:
[1] " seven books" " two met" " two factors" " three lists" " seven is" " two when" " ten inches." " one war"
[9] " one button" " six minutes." " ten years" " two shares" " two distinct" " five cents" " two pins" " five robins."
[17] " four kinds" " three story" " three inches" " six comes" " three batches" " two leaves."
As I cannot possibly know upfront if I have considered all numbers possible, I was wondering if R provides a tool that can identify spelled-out numbers? I have found similar questions, e.g. Convert spelled out number to number but this is unfortunately not a question about R.
Any help is appreciated.