2

The difference between my question and existing questions is that I want to create new columns with mutate that do not depend on existing columns.

Some dummy data:

library(dplyr)
dat <- tibble(
    a = 1:5,
    b = LETTERS[1:5]
)

I know I can create new columns one-by-one like so

dat <- dat %>%
    mutate(foo = NA, bar = NA, bar2 = NA)

And I can modify columns more conveniently using across, e.g. :

new_vars <- c("foo", "bar", "bar2")
dat <- dat %>%
    mutate(across(all_of(new_vars), ~ replace(., is.na(.), 0)))

But how do I create new columns without referencing existing columns in a similar manner? E.g. adding new columns filled with NA:

tibble(
    a = 1:5,
    b = LETTERS[1:5]
) %>% 
    # mutate(across(all_of(new_vars), ~ function(.x) NA))  # Error
    mutate(across(all_of(new_vars), NA))                   # Error

Open to any tidyverse alternatives.

Earlien
  • 145
  • 9
  • I also tried replacing `new_vars` with `!!!syms(new_vars)` as I thought would be required, but this still results in an error. – Earlien Aug 09 '23 at 00:47
  • 1
    To create a column requires that it have both a name and a value (or values equal to the length of the existing columns). So you can use `setNames(rep(NA, length(new_vars)), new_vars)` to create the name value pairs, then splice this into the mutate call: `dat %>% mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))`. – Ritchie Sacramento Aug 09 '23 at 01:17
  • Whoever closed this, this is not a duplicate. The answers in the linked question use `[]` which is the code I'm trying replace - it is not pipe friendly. I need a `tidyverse` solution. Please reopen. – Earlien Aug 09 '23 at 01:29
  • The first answer in the linked question used `[]` - the second one works too, and does not. Your question is a duplicate in the sense that the same problem has been solved but if you specifically want pipes + tidyverse, fair enough. – neilfws Aug 09 '23 at 01:42
  • 1
    There are also [multiple solutions here](https://stackoverflow.com/questions/18214395/add-empty-columns-to-a-dataframe-with-specified-names-from-a-vector) including the one proposed by Ritchie. – neilfws Aug 09 '23 at 01:51
  • 1
    @Earlien could you clarify why the `df %>% mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))` approach [found here](https://stackoverflow.com/a/74020972/12109788) and mentioned by Ritchie Sacramento does not fit you needs? It seems to work perfectly and is tidy compatible. – jpsmith Aug 09 '23 at 01:54
  • @jpsmith Was just looking at that - that actually does look very similar to what I was envisaging with `across`. I'm happy to accept that as the answer (since it only got 1 vote on that question compared to the accepted answer of 85 votes, I think a new answer is warranted. Could you add it?) – Earlien Aug 09 '23 at 01:57

4 Answers4

4

Similar to this answer buried in the popular question here, you can use:

new_vars <- c("foo", "bar", "bar2")

tibble(
  a = 1:5,
  b = LETTERS[1:5]
) %>% 
  mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))
# or (thanks @joran)
# tibble::add_column(!!!setNames(rep(NA, length(new_vars)), new_vars))

output

     a b     foo   bar   bar2 
  <int> <chr> <lgl> <lgl> <lgl>
1     1 A     NA    NA    NA   
2     2 B     NA    NA    NA   
3     3 C     NA    NA    NA   
4     4 D     NA    NA    NA   
5     5 E     NA    NA    NA   
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • This method also works with just `tibble::add_column` in place of `mutate`. – joran Aug 09 '23 at 02:33
  • @joran thanks! I included this in the edit. I was just trying to do justice to the other answer, I also agree with your answer that base is the way to go in this context (+1!) – jpsmith Aug 09 '23 at 02:36
4

I use tidyverse stuff as much as the next fellow, but the lengths we're going to to avoid doing things the simple way is getting a little silly, imho.

Here. Pipe friendly.

library(dplyr)
dat <- tibble(
  a = 1:5,
  b = LETTERS[1:5]
)

new_vars <- c("foo", "bar", "bar2")

# ?
# dat[new_vars] <- NA

add_vars <- function(df,vars,val){
  df[vars] <- val
  df
}

dat |>
  add_vars(df = _,vars = new_vars,val = NA)

You could even use an anonymous function (but only with the magrittr pipe):

dat %>%
  (\(x) {x[new_vars] <- NA; x})

This also works (with the magrittr pipe) with the function(x) syntax.

joran
  • 169,992
  • 32
  • 429
  • 468
3

Using dplyr::bind_cols() and pipes:

library(dplyr)

tibble(a = 1:5,
       b = LETTERS[1:5]) %>% 
bind_cols(., setNames(lapply(new_vars, function(x) x = NA), new_vars))

Result:

# A tibble: 5 × 5
      a b     foo   bar   bar2 
  <int> <chr> <lgl> <lgl> <lgl>
1     1 A     NA    NA    NA   
2     2 B     NA    NA    NA   
3     3 C     NA    NA    NA   
4     4 D     NA    NA    NA   
5     5 E     NA    NA    NA

Although I think the second answer to this question, on which this is based, is just as good.

If you really want mutate, Ritchie's answer in the comments works.

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thanks, I was just looking at that answer. It looks so unwieldy compared to the `dat[new_vars] <- NA` approach, but at least it is tidy `compatible`. I'm hoping there is a neater solution though. – Earlien Aug 09 '23 at 01:44
1

Maybe this is the style you're looking for:

library(dplyr)

dat <- tibble(
    a = 1:5,
    b = LETTERS[1:5]
)

new_vars <- c("foo", "bar", "bar2")

dat %>% 
    purrr::reduce(new_vars, ~mutate(.x, {{.y}} := 0), .init = .)

Instead of using across() we use purrr::reduce() which will loop over the new_vars. We apply the mutate function to the output of the previous iteration. We want to start with .init = dat, but we pipe it in.

You could even use reduce2 if you wanted to have different values for each of the new_vars.

Michael Dewar
  • 2,553
  • 1
  • 6
  • 22