0

How to convert all the Latin numbers (such as "xxv," "xxxv," "iii," and "ii") into numerical values in text data with R?

I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?

In addition, when I replace one by one, what if I have some words contains letters like "ii", "i"? For example, would the world "still" be changed into "st1ll"?

Emily
  • 27
  • 4

1 Answers1

1
txt <- 'How to convert all the Latin numbers (such as "xxv," "xxxv," "iii," and "ii") into numerical values in text data with R?
  
I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?
  
In addition, when I replace one by one, what if I have some words contains letters like "ii", "i"? For example, would the world "still" be changed into "st1ll"?'

Get a vector of roman characters (note if you make this too large, the gregexpr will throw an error, I didn't test to see what the limit is, however--it's somewhere between 1e2 and 1e3)

Exclude "I" since that is more likely not to be a numeral, then create your pattern and treat it like any other string find/replace:

rom <- as.character(as.roman(1:1e2))
rom <- setdiff(rom, 'I')

p <- sprintf('\\b(%s)\\b', paste0(na.omit(rom), collapse = '|'))
m <- gregexpr(p, txt, ignore.case = TRUE)
regmatches(txt, m) <- lapply(regmatches(txt, m), function(x) as.numeric(as.roman(x)))

cat(txt)

# How to convert all the Latin numbers (such as "25," "35," "3," and "2") into numerical values in text data with R?
#   
# I need to convert all the Latin numbers in a text data into numerical values. Is there any function in R can convert all the Latin numbers at once?
#   
# In addition, when I replace one by one, what if I have some words contains letters like "2", "i"? For example, would the world "still" be changed into "st1ll"?

As a function:

dd <- data.frame(
  texts = rep(txt, 5)
)

rom_to_num <- function(text, rom = 1:1e2, exclude = 'I') {
  rom <- as.character(as.roman(rom))
  rom <- setdiff(rom, exclude)
  
  p <- sprintf('\\b(%s)\\b', paste0(na.omit(rom), collapse = '|'))
  m <- gregexpr(p, text, ignore.case = TRUE)
  regmatches(text, m) <- lapply(regmatches(text, m), function(x) as.numeric(as.roman(x)))
  
  text
}

dd <- within(dd, {
  texts_new <- rom_to_num(texts)
})
rawr
  • 20,481
  • 4
  • 44
  • 78
  • Is it possible to convert only the independent Latin number(I mean with space ahead and following)? I do not want "ii" and "iv" to disappear in other words in the text. – Emily Nov 07 '22 at 08:29
  • See the answer to your other question here: https://stackoverflow.com/questions/74399312/how-to-select-the-number-in-a-text-r/74400995#74400995 – dufei Nov 11 '22 at 10:18
  • @Emily they dont, there are even instances of those not changing in your example – rawr Nov 11 '22 at 15:19
  • Thanks, you are right. Answer accepted! I just have one more follow-up question: My data structure is a data frame, and I have text as a variable for over 1000 observations. I followed your code, and it worked well to convert numbers. But the text from each observation converges into a whole and does not correspond to the observation one by one. How can I solve this problem? I am a beginner with R, and thank you in advance :) – Emily Nov 14 '22 at 16:30
  • @Emily you just need to apply the code as a loop or create a function, see edits – rawr Nov 14 '22 at 18:15