I am writing a series of functions that use dplyr
internally to manipulate data.
There are a number of places where I'd like to add new variables to the data set as I work with it. However, I am not sure how to name these new variables so as to avoid overwriting variables already in the data, given that I don't know what's in the data set being passed.
In base R I can do this:
df <- data.frame(a = 1:5)
df[, ncol(df)+1] <- 6:10
and it will select a name for the newly-added variable that doesn't conflict with any existing names. I'd like to do this in dplyr
rather than breaking up the consistent application of dplyr
to go back to base-R.
All the solutions I've thought of so far feel very kludgy, or require the use of a bunch of base-R futzing anyway that isn't any better than just adding the variable in base-R:
- Rename all the variables so I know what the names are
- Pull out the
names()
vector and use one of many methods to generate a name that isn't in the vector - Error out if the user happens to have my internal variable names in their data (bad-practice Olympics!)
Is there a straightforward way to do this in dplyr
? Getting it to work in mutate
would be ideal, although I suppose bind_cols
or tibble::add_column
would also be fine.
Some things I have tried that don't work:
df <- data.frame(a = 1:5)
# Gives the new variable a fixed title which might already be in there
df %>% mutate(6:10)
df %>% tibble::add_column(6:10)
df %>% mutate(NULL = 6:10)
# Error
df %>% bind_cols(6:10)
df %>% mutate( = 6:10)
df %>% mutate(!!NULL := 6:10)
# And an example of the kind of function I'm looking at:
# This function returns the original data arranged in a random order
# and also the random variable used to arrange it
arrange_random <- function(df) {
df <- df %>%
mutate(randomorder = runif(n())) %>%
arrange(randomorder)
return(df)
}
# No naming conflict, no problem!
data <- data.frame(a = 1:5)
arrange_random(data)
# Uh-oh, the original data gets lost!
data <- data.frame(randomorder = 1:5)
arrange_random(data)