4

Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:

replace_number("123 0 boogie")
[1] "one hundred twenty three boogie"
replace_number("1;1 foo")
[1] "one;one foo"
replace_number("47 bar")
[1] "forty seven bar"
replace_number("0")
"zero"

I think all of the following are bad because "zero" is missing from the output:

replace_number("1;0 foo")
[1] "one; foo"
replace_number("00 bar")
[1] "bar"
replace_number("0x")
[1] "x"

Basically, I'd say that replace_number() is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?

smci
  • 32,567
  • 20
  • 113
  • 146
David Kaufman
  • 989
  • 1
  • 7
  • 20
  • 1
    You can file an issue. It can be a feature request or in the very least a message of warning in the documentation. – Pierre L Aug 01 '16 at 23:34

1 Answers1

5

If you dig into the guts of replace_number:

 unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) {
        if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", 
            perl = TRUE))) > 1) {
            num_sub(x, num.paste = num.paste)
        }
        else {
            x
        }
    }), function(x) mgsub(0:9, ones, x)))

you can see that the problem occurs in qdap:::num_sub

qdap:::num_sub("101", num.paste = "combine") ## "onehundredone"
qdap:::num_sub("0", num.paste = "combine")   ## ""

Digging within that function, the issue occurs in numb2word, which has internal codes

ones <- c("", "one", "two", "three", "four", "five", "six", 
    "seven", "eight", "nine")
names(ones) <- 0:9

which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number could take a logical argument blank_zeros=TRUE, which got passed down to numb2word and did the right thing, e.g.

ones <- c(if (blank_zeros) "" else "zero",
          "one", "two", "three", "four", "five", "six", 
    "seven", "eight", "nine")

In the meantime I have posted this on the qdap issues list.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453