1

I have a mixed dataframe of characters, integers and factors that I'd like to convert to uppercase. This is a commonly posted question (e.g. here) but I can't an answer that will change characters AND factors to uppercase without converting between data types. Working example below:


# create a three column dataframe with characters, integers and factors:

df <- data.frame(v1=letters[1:5],v2=1:5,v3=as.factor(letters[10:14]),stringsAsFactors=FALSE)

  v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n

glimpse(df)

# v1 <chr> "a", "b", "c", "d", "e"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n

mutate_all and toupper changes to uppercase, but converts factors into characters:

df <- mutate_all(df, funs(toupper))
glimpse(df)

# v1 <chr> "A", "B", "C", "D", "E"
# v2 <chr> "1", "2", "3", "4", "5"
# v3 <chr> "J", "K", "L", "M", "N"

mutate_if and str_to_upper work for is.character but not for factors:

df <- df %>% mutate_if(is.character, str_to_upper)
glimpse(df)

# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n

mutate_if and str_to_upper works for is.factor BUT converts factors into characters:

df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, str_to_upper)
glimpse(df)

# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <chr> "J", "K", "L", "M", "N"

Ideally, I'd like to find a blanket solution that preserves data types and can be applied to any dataframe.

Thomas Moore
  • 192
  • 1
  • 12

3 Answers3

2
df %>% 
  mutate(across(where(is.character), str_to_upper),
         across(where(is.factor), ~ factor(str_to_upper(.x))))
Jingxin Zhang
  • 234
  • 2
  • 3
  • Perfect, thanks Jingxin! Now... is there a way I can include change colnames within this as a single tidyverse solution? e.g. toupper(colnames(df)) ? – Thomas Moore Aug 17 '20 at 01:42
2

To build on Jingxin's response and address Thomas Moore's follow-up question, you can change the column names to upper with the following addition:

df %>% 
  mutate(across(where(is.character), str_to_upper),
         across(where(is.factor), ~ factor(str_to_upper(.x)))) %>%
  rename_with(str_to_upper)
Andrew Lee
  • 91
  • 5
1

toupper or str_to_upper changes the class to character. You have two options :

  1. Convert back to factor :
df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, ~factor(str_to_upper(.)))
str(df)

#'data.frame':  5 obs. of  3 variables:
# $ v1: chr  "a" "b" "c" "d" ...
# $ v2: int  1 2 3 4 5
# $ v3: Factor w/ 5 levels "J","K","L","M",..: 1 2 3 4 5
  1. Change the levels of factor variables. Combining step 1 and 2 above into 1.
df <- df %>% mutate_if(~is.character(.) || is.factor(.), 
            ~if(is.factor(.)) {levels(.) <- toupper(levels(.));.} else toupper(.))

Note that _if, _at, _all verbs have been deprecated in dplyr 1.0.0 in favour of across.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213