Creating multiple NEW columns using across() in R

Question

The difference between my question and existing questions is that I want to create new columns with mutate that do not depend on existing columns.

Some dummy data:

library(dplyr)
dat <- tibble(
    a = 1:5,
    b = LETTERS[1:5]
)

I know I can create new columns one-by-one like so

dat <- dat %>%
    mutate(foo = NA, bar = NA, bar2 = NA)

And I can modify columns more conveniently using across, e.g. :

new_vars <- c("foo", "bar", "bar2")
dat <- dat %>%
    mutate(across(all_of(new_vars), ~ replace(., is.na(.), 0)))

But how do I create new columns without referencing existing columns in a similar manner? E.g. adding new columns filled with NA:

tibble(
    a = 1:5,
    b = LETTERS[1:5]
) %>% 
    # mutate(across(all_of(new_vars), ~ function(.x) NA))  # Error
    mutate(across(all_of(new_vars), NA))                   # Error

Open to any tidyverse alternatives.

I also tried replacing `new_vars` with `!!!syms(new_vars)` as I thought would be required, but this still results in an error. — Earlien, Aug 09 '23 at 00:47
To create a column requires that it have both a name and a value (or values equal to the length of the existing columns). So you can use `setNames(rep(NA, length(new_vars)), new_vars)` to create the name value pairs, then splice this into the mutate call: `dat %>% mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))`. — Ritchie Sacramento, Aug 09 '23 at 01:17
Whoever closed this, this is not a duplicate. The answers in the linked question use `[]` which is the code I'm trying replace - it is not pipe friendly. I need a `tidyverse` solution. Please reopen. — Earlien, Aug 09 '23 at 01:29
The first answer in the linked question used `[]` - the second one works too, and does not. Your question is a duplicate in the sense that the same problem has been solved but if you specifically want pipes + tidyverse, fair enough. — neilfws, Aug 09 '23 at 01:42
There are also [multiple solutions here](https://stackoverflow.com/questions/18214395/add-empty-columns-to-a-dataframe-with-specified-names-from-a-vector) including the one proposed by Ritchie. — neilfws, Aug 09 '23 at 01:51
@Earlien could you clarify why the `df %>% mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))` approach [found here](https://stackoverflow.com/a/74020972/12109788) and mentioned by Ritchie Sacramento does not fit you needs? It seems to work perfectly and is tidy compatible. — jpsmith, Aug 09 '23 at 01:54
@jpsmith Was just looking at that - that actually does look very similar to what I was envisaging with `across`. I'm happy to accept that as the answer (since it only got 1 vote on that question compared to the accepted answer of 85 votes, I think a new answer is warranted. Could you add it?) — Earlien, Aug 09 '23 at 01:57

jpsmith · Accepted Answer · 2023-08-09T02:34:14.113

4

Similar to this answer buried in the popular question here, you can use:

new_vars <- c("foo", "bar", "bar2")

tibble(
  a = 1:5,
  b = LETTERS[1:5]
) %>% 
  mutate(!!!setNames(rep(NA, length(new_vars)), new_vars))
# or (thanks @joran)
# tibble::add_column(!!!setNames(rep(NA, length(new_vars)), new_vars))

output

     a b     foo   bar   bar2 
  <int> <chr> <lgl> <lgl> <lgl>
1     1 A     NA    NA    NA   
2     2 B     NA    NA    NA   
3     3 C     NA    NA    NA   
4     4 D     NA    NA    NA   
5     5 E     NA    NA    NA

edited Aug 09 '23 at 02:34

answered Aug 09 '23 at 02:03

jpsmith

11,023
5
15
36

This method also works with just `tibble::add_column` in place of `mutate`. – joran Aug 09 '23 at 02:33
@joran thanks! I included this in the edit. I was just trying to do justice to the other answer, I also agree with your answer that base is the way to go in this context (+1!) – jpsmith Aug 09 '23 at 02:36

score 4 · Answer 2 · answered Aug 09 '23 at 02:21

I use tidyverse stuff as much as the next fellow, but the lengths we're going to to avoid doing things the simple way is getting a little silly, imho.

Here. Pipe friendly.

library(dplyr)
dat <- tibble(
  a = 1:5,
  b = LETTERS[1:5]
)

new_vars <- c("foo", "bar", "bar2")

# ?
# dat[new_vars] <- NA

add_vars <- function(df,vars,val){
  df[vars] <- val
  df
}

dat |>
  add_vars(df = _,vars = new_vars,val = NA)

You could even use an anonymous function (but only with the magrittr pipe):

dat %>%
  (\(x) {x[new_vars] <- NA; x})

This also works (with the magrittr pipe) with the function(x) syntax.

neilfws · Answer 3 · 2023-08-09T01:44:22.167

3

Using dplyr::bind_cols() and pipes:

library(dplyr)

tibble(a = 1:5,
       b = LETTERS[1:5]) %>% 
bind_cols(., setNames(lapply(new_vars, function(x) x = NA), new_vars))

Result:

# A tibble: 5 × 5
      a b     foo   bar   bar2 
  <int> <chr> <lgl> <lgl> <lgl>
1     1 A     NA    NA    NA   
2     2 B     NA    NA    NA   
3     3 C     NA    NA    NA   
4     4 D     NA    NA    NA   
5     5 E     NA    NA    NA

Although I think the second answer to this question, on which this is based, is just as good.

If you really want mutate, Ritchie's answer in the comments works.

edited Aug 09 '23 at 01:44

answered Aug 09 '23 at 01:39

neilfws

32,751
5
50
63

Thanks, I was just looking at that answer. It looks so unwieldy compared to the `dat[new_vars] <- NA` approach, but at least it is tidy `compatible`. I'm hoping there is a neater solution though. – Earlien Aug 09 '23 at 01:44

Michael Dewar · Answer 4 · 2023-08-09T02:53:40.880

Maybe this is the style you're looking for:

library(dplyr)

dat <- tibble(
    a = 1:5,
    b = LETTERS[1:5]
)

new_vars <- c("foo", "bar", "bar2")

dat %>% 
    purrr::reduce(new_vars, ~mutate(.x, {{.y}} := 0), .init = .)

Instead of using across() we use purrr::reduce() which will loop over the new_vars. We apply the mutate function to the output of the previous iteration. We want to start with .init = dat, but we pipe it in.

You could even use reduce2 if you wanted to have different values for each of the new_vars.

Creating multiple NEW columns using across() in R

4 Answers4