1

This question comes after a brainstorm in this question Capitalize the first letter of both words in a two word string

I would like to write a function that accepts vectors, which capitalizes every word in a sentence, and de-capitalizes the rest of the word. It is easier to only capitalize the first letter only:

CapitalizeFirstWord <- function(vector) {
  s <- sapply(sapply(vector, substring, 1, 1), toupper)
  t <- sapply(sapply(vector, substring, 2), tolower)
  binded <- cbind(s,t)
  apply(binded, 1, paste, collapse= "")
}

So that CapitalizeFirstWord(c("heLlo", "ABC", "GooD daY")) results in

 heLlo        ABC   GooD daY 
"Hello"      "Abc" "Good day" 

( Wrote it with help with this question Paste multiple columns together )

But I can't make it work so that every word in the sentence is capitalized.

This is my failed attempt:

CapitalizeEveryWord <- function(vector) {
  vectorS <- sapply(vector, strsplit, " ")
  s <- sapply(sapply(vectorS, substring, 1, 1), toupper)
  t <- sapply(sapply(vectorS, substring, 2), tolower)
  binded <- cbind(s,t)
  apply(binded, 1, paste, collapse= "")
}

So that CapitalizeEveryWord(c("heLlo", "ABC", "GooD daY")) results in

 heLlo                                 ABC                            GooD daY 
"Hello"                               "Abc" "c(\"G\", \"D\")c(\"ood\", \"ay\")" 

I don't know how to change the cbind() or paste() functions' behaviour so that it is rearranged correctly.

I am doing this work because I have a really large data frame which has most of its strings capitalized.

The script should consume as less time as possible (because iterating with a for() loop, for every row and capitalizing only the first letter, is very slow, and I am having problems with it to work parallelized with parLapply()). That's why I used *apply() family of functions to create a new faster function.

Community
  • 1
  • 1
Geiser
  • 1,054
  • 1
  • 12
  • 28

2 Answers2

3

Use the built-in function for this exact use-case from stringi:

library(stringi)

v1 <- c("heLlo", "ABC", "GooD daY")
stri_trans_totitle(v1)

## [1] "Hello"    "Abc"      "Good Day"
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
2

We can use gsub

gsub("\\b(.)", "\\U\\1", tolower(v1), perl=TRUE)
#[1] "Hello"    "Abc"      "Good Day"

If it is to capitalize only first word

sub("(.)", "\\U\\1", tolower(v1), perl=TRUE)
#[1] "Hello"    "Abc"      "Good day"

data

v1 <- c("heLlo", "ABC", "GooD daY")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Sorry for explaining me bad. I want also to de-capitalize the rest of the words. I've edited my question also to reflect this. I want capitalization in every word, but not in the rest of the word. Thank you for your quick answer. – Geiser May 06 '16 at 09:55
  • 1
    Both answers are correct, but I've tested that this way is a bit faster (in processing time) than the other. – Geiser May 09 '16 at 08:41