0

I have a data frame in which some of the fields need to be truncated to different lengths. Here is a sample data:

library(dplyr)

id <- c(1111, 2222, 3333, 4444, 5555)
first_name <- c("Jonathan", "Sally", "Courtney", "Stephen", "Matthew")
last_name <- c("Johnson", "Montgomery", "Cunningham", "Stephenson", "Matthews")
Height <- c(200, 160, 170, 180, 190)
 
df <- data.frame(id, first_name, last_name, Height, stringsAsFactors = FALSE)

I want to truncate first_name and last_name columns to the corresponding lengths of 3 and 5. I created a simple function that accepts a data frame, field name and field length.

truncate_fields <- function(df, field, n) {
  df_tr <-
    mutate(df, {{field}} := str_trunc({{field}}, n, ellipsis = ""))
  return(df_tr)
}

The function works great. I now want to apply it to a subset of columns using a list of lengths.

fields <- c("first_name", "last_name")
lengths <- c(3, 5)

for(i in 1:length(fields)) {
  df <-
    truncate_fields(df, noquote(fields)[i], lengths[i])
}

However I get the following error:

Error in `splice()`:
! The LHS of `:=` must be a string or a symbol

I also tried lapply and mapply without much success. Any help is greatly appreciated. Thank you.

Update:

Using an older version of dplyr (0.7.4.)

Annabanana
  • 91
  • 1
  • 3
  • 13

2 Answers2

2

The noquote function only affects how string values are printed the the console. noquote("variable") is very different from just variable. If you want to turn a string into a symbol, you can use rlang::sym and inject that into the calling expression with !!. So the following will work

for(i in 1:length(fields)) {
  df <-
    truncate_fields(df, !!rlang::sym((fields)[i]), lengths[i])
}

and that will be logically equivalent to

df <- truncate_fields(df, first_name, 3)
df <- truncate_fields(df, last_name, 5)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Your solution worked beautifully in my dev environment. Unfortunately, I am stuck with an older package versions in production. I'm still getting an error that LHS must be a name or string. Any thoughts on how to overcome this using dplyr 0.7.4.? Perhaps using `mutate_()`. – Annabanana May 01 '23 at 15:26
1

I would also add that you can use existing tools to do this without reassigning the data frame in multiple loops, in fact you do not even need a function if all you are doing is str_trunc:

df %>% 
  mutate(across(all_of(fields), ~ str_trunc(.x, lengths[fields == cur_column()])))

across will apply a function over a set of columns. As it iterates of the columns you can access the current iteration column name with cur_column to match the index in fields and use that index to pull the appropriate value from lengths.

Update

Given you are using older versions of dplyr I think a clean solution uses the purrr package (this should work for old versions of purrr too):

library(purrr)

# named vector is important for use in imap
names(lengths) <- fields

lexprs <- imap(width, ~ expr(str_trunc(!!sym(.y) , width = !!.x)))

df %>% 
  mutate(!!! lexprs)

This creates a list of expressions which you can view by looking at lexpr. Then within mutate, you can use the split-slice !!! operator to unpack this to be evaluated. This only works because lexprs carries forward the names from the named vector:

rlang::qq_show(
  df %>% 
    mutate(!!! lexprs)
)
df %>% mutate(first_name = str_trunc(first_name, width = 4), last_name = str_trunc(
  last_name, width = 5))

Update 2

From this SO question, you can use the global assignment operator <<- as long as fields and lengths are in the same order:

i <- 0
df %>% 
  mutate_at(vars(fields), ~ str_trunc(.x, width = lengths[i <<- i + 1]))
LMc
  • 12,577
  • 3
  • 31
  • 43