3

With dplyr it is easy to create a new column using mutate:

df <- data.frame(v1 = 1:3, v2 = c('a','b','c'))
> mutate(df, newcol = NA)
  v1 v2 newcol
1  1  a     NA
2  2  b     NA
3  3  c     NA

We can also create multiple new columns with a vector using mutate_at (shown here):

> cnames <- c('newcol1', 'newcol2', 'newcol3')
> mutate_at(df, cnames, funs(log(v1)))
  v1 v2   newcol1   newcol2   newcol3
1  1  a 0.0000000 0.0000000 0.0000000
2  2  b 0.6931472 0.6931472 0.6931472
3  3  c 1.0986123 1.0986123 1.0986123

Is there a simple way to initialize these new columns as NA using dplyr?

For example, mutate_at(df, cnames, funs(v1 * NA)) gives the desired result, but that seems indirect. What I would like is something along the lines of:

mutate_at(df, cnames, funs(. = NA)) # Error: Can't create call to non-callable object

where we don't need to know the names of any other columns.

(I know this is simply solved with df[ , cnames] <- NA, but I'm looking for a solution using dplyr functions)


EDIT:

Using later versions of dplyr the example becomes:

mutate_at(df, all_of(cnames), funs(log(v1)))
C. Braun
  • 5,061
  • 19
  • 47
  • Not sure if it is a bug. You don't need 'v1' there. any number would be sufficient i.e. 1 or 0 `mutate_at(df, cnames, funs( NA * 0))` or even add `+` – akrun Mar 14 '18 at 14:33
  • When I try your example, I get the following error message: "Error: Can't subset columns that don't exist. x Column `newcol1` doesn't exist." Apparently the values in the vector cnames is expected to be already existing. How did you get your code to work? I'm working with dplyr version 0.8.3 and R version 3.6.3 – Adriaan Nering Bögel Jan 06 '21 at 12:07
  • @AdriaanNeringBögel, updated – now I think you need to use `all_of`. – C. Braun Jan 08 '21 at 01:35
  • @C.Braun it still does not work. RStudio would give a warning if `all_of` was required. – Adriaan Nering Bögel Jan 08 '21 at 07:41

1 Answers1

4

You could do this.

library(dplyr)
df %>% 
 `is.na<-`(cnames)
#  v1 v2 newcol1 newcol2 newcol3
#1  1  a      NA      NA      NA
#2  2  b      NA      NA      NA
#3  3  c      NA      NA      NA

I hope one %>% is dplyr enough. ;)

markus
  • 25,843
  • 5
  • 39
  • 58
  • 1
    That is simple, but I feel like there should be a way to do this using `dplyr` functions (more of a personal preference for not using backticked functions). I had noticed that `df %>% '[<-'(, cnames, NA)` works as well, but both interrupt the flow of the `dplyr` chain in my opinion. – C. Braun Mar 14 '18 at 15:31
  • 2
    @C.Braun You might know that the `magrittr` package provides a set of aliases for cases like this, see `?magrittr::add`. You could define your own, to make piping more pleasant, e.g. `set_NA <- \`is.na<-\`; df %>% set_NA(cnames)`. – markus Mar 14 '18 at 18:27