0

I have a single column data frame where each row is a statement. The statements are mostly alpha characters, but there are a few numeric characters. I am trying to locate all numeric characters and replace them with their corresponding alpha characters.

Basically, I want to go from this

 "I looked at the watermelons around 12 today"
 "There is a dog on the bench"
 "the year is 2017"
 "I am not hungry"
 "He turned 1 today"

into (or something similar to)

 "I looked at the watermelons around twelve today"
 "There is a dog on the bench"
 "the year is two thousand seventeen"
 "I am not hungry"
 "He turned one today"

There are functions I am familiar with that turn numbers into words, such as the numbers_to_words function from the xfun package, but I don't know how to do this systematically for the entire data frame.

Alokin
  • 461
  • 1
  • 4
  • 22

2 Answers2

2

Here's one approach with the stringr and english packages.

library(stringr)
library(english)
data<-  c("I looked at the watermelons around 12 today", "There is a dog on the bench", "the year is 2017", "I am not hungry", "He turned 1 today")
Replacement <-  lapply(str_extract_all(data,"[0-9]+"),function(x){
                   as.character(as.english(as.numeric(x)))})

sapply(seq_along(data),
       function(i){
         ifelse(grepl('[0-9]+',data[i]),
                str_replace_all(data[i],"[0-9]+",Replacement[[i]]),
                data[i])})
[1] "I looked at the watermelons around twelve today" "There is a dog on the bench"                    
[3] "the year is two thousand seventeen"              "I am not hungry"                                
[5] "He turned one today"  

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • This does not seem to work on my machine. I still get the numeric values to be numeric. What data type do you have your data as? Mine comes up as chr. – Alokin Mar 25 '20 at 15:36
0

Actually i dont know an easy function or something like this but i have a maybe little bit bad solution for you:

library(xfun)
a <- "I looked at the watermelons around 12 today"        
y <- numeric(nchar(a))        
for(i in 1:nchar(a))        
{        
  y[i]<-as.numeric(substr(a,i,i))        
}        
x <- n2w(as.numeric(paste(na.omit(y), collapse="")))        
z <- which(y != "NA")        
paste(c(substr(a, 1, z[1]-1), x, substr(a, z[length(z)] + 1, nchar(a))), collapse = "")

and at the moment it only works for one number in one sentence

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Triss
  • 561
  • 3
  • 11
  • Is there anyway to do this in Python? – Alokin Mar 25 '20 at 14:53
  • im not that familiar with python unfortunately – Triss Mar 25 '20 at 15:01
  • there is the package "tidyr" with the function "extract_numeric", so you can use "extract_numeric(yoursentence)" to find the numbers or an alternative is as.numeric(gsub("[^\\d]+", "", yoursentence, perl=TRUE)) – Triss Mar 25 '20 at 15:03
  • @Triss FYI, for code formatting use 3 backticks before the code block and 3 backticks after the code block, you don't need backticks on every line. – Gregor Thomas Mar 25 '20 at 15:46