0

I occasionally want to export a dataset to Stata but I sometimes have variable names that exceed the 32 character limit that Stata allows, so I am trying to write a function that identifies those variables and then truncates them to 32 characters. The below code works in everything (i.e. displaying offending variable indices and names) except actually writing the truncated names back to the data frame.

short_var_names <- function(data.in){
  long_names_indices <- data.frame(long_name_indices = which(nchar(names(data.in)) > 32))
  long_names <- data.frame(long_names = names(data.in[which(nchar(names(data.in)) > 32)]))
  short_names <- data.frame(short_names = names(data.in[as.numeric(unlist(long_names_indices))]))
  print(long_names_indices)
  print(long_names)
  print(short_names)
  names(data.in) <- substring(names(data.in), 1, 32) # now do the actual operation on the df
}

short_var_names(dat_clin)

When I use the following outside of the function, it works as expected:

names(dat_clin) <- substring(names(dat_clin), 1, 32)

So, why isn't this command being returned in the function?

LucaS
  • 887
  • 1
  • 9
  • 22
  • 1
    Any values that you change inside a function only exist inside that function. By default you cannot modify objects outside the scope of the function, normally you return updated values from the function and save them somewhere. – MrFlick Dec 20 '22 at 20:47
  • Ok thanks. I'm new to writing functions. How do I save the updated df outside the function then (from within the function call)? – LucaS Dec 20 '22 at 20:51
  • 2
    Make sure that `data.in` or `return(data.in)` is the last line of your function and then replace it when you call the function `dat_clin <- short_var_names(dat_clin)` – MrFlick Dec 20 '22 at 20:52
  • Thank you both. I feel silly - I should have realised about resaving the object, although I didn't know about needing to explicitly return the object in the function call - I thought the last line I had did that already. – LucaS Dec 20 '22 at 20:58
  • 1
    The last line of any function is indeed automatically returned. But when you have `names(data.in) <- "something"` as your last line, that assignment actually returns just the names, it doesn't return the data object. You just need to be a bit more careful if you have an assignment as the last line of your function. – MrFlick Dec 20 '22 at 21:02

0 Answers0