2

My function has to turn all uppercases in a given string to lowercases and vice versa. I used to solve such problems with loops. So, my code is:

mirror_case <- function(x){
   for(i in x){
     ifelse(i==toupper(i),x <- 
       str_replace_all(x,i,tolower(i)),
            ifelse(i==tolower(i),x <- 
       str_replace_all(x,i,toupper(i)),
                  x <- gsub(i,i,x)))}
   return(x)}

I checked this on several strings. Sometimes it works and sometimes doesn't.

> d
[1] "LKJLjlei 33"
> mirror_case(d)
[1] "LKJLjlei 33"

> e
[1] "asddf"
> mirror_case(e)
[1] "ASDDF"

> f
[1] "ASDDF"
> mirror_case(f)
[1] "asddf"

So, what's wrong with this function? I'd like not only to get the answer, but also some explanations to understand the problem and not come back here with the similar question.

Stedy
  • 7,359
  • 14
  • 57
  • 77
Elena
  • 65
  • 3
  • @JohnColeman...that is a misconception. There is nothing wrong with `for` loops in R. Also, the *apply* family are hidden loops. See this [canonical thread](https://stackoverflow.com/questions/28983292/is-the-apply-family-really-not-vectorized). However `toupper()` and `tolower()` are vectorized as 李哲源 shows. – Parfait Oct 05 '18 at 01:23

3 Answers3

5

A string in R is not a sequence like it is in python, and can not be traversed in a for loop like this. You should break the string to individual characters first. Try this:

mirror_case <- function(s) {
  # break to characters
  chars <- strsplit(s, '') 
  # apply your ifelse statement to all characters
  mirror_chars <- sapply(chars, function(i) 
    ifelse(toupper(i) == i, tolower(i), toupper(i))) 
  # join back to a string
  mirror_s <- paste(mirror_chars, collapse = "")
  return(mirror_s)
}

mirror_case("LKJLjlei 33")
# [1] "lkjlJLEI 33"
Yosi Hammer
  • 588
  • 2
  • 8
  • 2
    Using `Reduce` (like any fold operation) to concatene strings results in quadratic time-complexity without compiler optimisations (Python optimises binary string concatenation in some cases, but I'm not sure about R). For this very reason, Python, Java and many other languages have a builtin `join` method for strings. – Eli Korvigo Oct 05 '18 at 01:23
  • 1
    In R this builtin is called `paste`, which is what you are already passing to `Reduce`, but you can instead call `paste0(mirror_chars)` – Eli Korvigo Oct 05 '18 at 01:31
  • replacing `Reduce(paste0, mirror_chars)` with `paste0(mirror_chars)` results in a different output: `[1] "l" "k" "j" "l" "J" "L" "E" "I" " " "3" "3"` – Yosi Hammer Oct 05 '18 at 01:37
  • I beg your pardon, I should've written `paste(mirror_chars, collapse="")` instead. I've forgotten, that `paste0` only substitutes `sep` – Eli Korvigo Oct 05 '18 at 01:43
2

@YosiHammer's solution does not need an sapply call (which is a loop) to run on list of one item from split. As @李哲源 shows in comments, like gsub, paste, even ifelse, toupper() and tolower() are vectorized functions and can receive multiple items in one call.

mirror_case <- function(s) {
  chars <- strsplit(s, '')[[1]]         # RETRIEVE THE CHARACTER VECTOR

  mirror_chars <- ifelse(toupper(chars) == chars, tolower(chars), toupper(chars))

  mirror_s = paste(mirror_chars, collapse = "")

  return(mirror_s)
}


mirror_case("LKJLjlei 33")
# [1] "lkjlJLEI 33"

mirror_case("AbCdEfGhIj")
# [1] "aBcDeFgHiJ"
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Are you sure running 5 implicit loops (even if each loop is faster) instead of 1 explicit loop (if you consider `sapply` explicit) actually results in any performance improvement? That is possible, but adding benchmarks would greatly improve your answer. P.S. it most-certainly requires ~ x4 more RAM. – Eli Korvigo Oct 05 '18 at 02:08
1

A simple solution to this problem is to use chartr function:

chartr("[A-Za-z]", "[a-zA-Z]", "bbBB 122")

Check it online

The function is vectorized:

chartr("[A-Za-z]", "[a-zA-Z]", c("bbBB 122", "QwER 12 bB"))

another option is to pass a function to str_replace_all but this is sub-optimal as can be seen from the benchmarks.

library(stringr)
str_replace_all(c("bbBB 122", "QwER 12 bB"),
                "[A-Za-z]",
                function(x)
                  ifelse(toupper(x) == x, tolower(x), toupper(x)))

benchmark:

data will be 100000 10 character strings:

dat <- as.vector(
  replicate(1e5,
          paste0(sample(c(LETTERS,
                   letters,
                   " ",
                   as.character(1:9)),
                 10,
                 replace = TRUE),
          collapse = "")
))

head(dat)
#output
"aPJAGOiirN" "FSYN DLYQS" "K7Vzh8qALH" "vQzU96JOVF" "WMmqO1D3Q8" "XdBiTG72zV"

functions proposed in other posts (not vectorized):

mirror_case <- function(s) {
  chars <- strsplit(s, '')[[1]]         # RETRIEVE THE CHARACTER VECTOR

  mirror_chars <- ifelse(toupper(chars) == chars, tolower(chars), toupper(chars))

  mirror_s = paste(mirror_chars, collapse = "")

  return(mirror_s)
}

mirror.case <- function(s) {
  # break to characters
  chars <- strsplit(s, '') 
  # apply your ifelse statement to all characters
  mirror_chars <- sapply(chars, function(i) 
    ifelse(toupper(i) == i, tolower(i), toupper(i))) 
  # join back to a string
  mirror_s <- paste(mirror_chars, collapse = "")
  return(mirror_s)
}


library(microbenchmark)

microbenchmark(missuse = chartr("[A-Za-z]", "[a-zA-Z]", dat),
           missuse2 = str_replace_all(dat,
                                      "[A-Za-z]",
                                      function(x)
                                        ifelse(toupper(x) == x, tolower(x), toupper(x))),
           Parfait = lapply(dat, mirror_case),
           YosiHammer = lapply(dat, mirror_case),
           times = 10)

results

Unit: milliseconds
       expr          min          lq        mean      median          uq         max neval
    missuse     9.607483    11.05621    18.48764    16.50272    19.06369    39.65646    10
   missuse2 11226.900565 11473.40730 11612.95776 11582.65838 11636.32779 12218.78642    10
    Parfait  1461.056405  1572.58683  1700.75182  1594.43438  1746.08949  2149.49213    10
 YosiHammer  1526.730674  1576.35174  1649.55893  1607.62199  1670.76008  1843.11601    10

as you can see the chartr method is around 100x faster than the other solutions.

Check equality of results:

all.equal(chartr("[A-Za-z]", "[a-zA-Z]", dat),
          unlist(lapply(dat, mirror_case)))

all.equal(chartr("[A-Za-z]", "[a-zA-Z]", dat),
          unlist(lapply(dat, mirror.case)))

all.equal(chartr("[A-Za-z]", "[a-zA-Z]", dat),
          str_replace_all(dat,
                          "[A-Za-z]",
                          function(x)
                            ifelse(toupper(x) == x, tolower(x), toupper(x))))
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
missuse
  • 19,056
  • 3
  • 25
  • 47