Unpack a data frame column into multiple columns

Question

I am trying to create a function that will mutate a column if it exists. If the column does exist, I return a data frame with two columns. I'd like help unpacking this data frame column, into its component columns:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

my_transformation = function(df){
  df %>% 
    mutate(across(any_of('cyl'), function(x) tibble(a = x + 3, b = x + 1)))
}

df_1 = as_tibble(mtcars)
df_2 = df_1 %>% select(-cyl)

my_transformation(df_1)
#> # A tibble: 32 x 11
#>      mpg cyl$a    $b  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       9     7  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       9     7  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     7     5  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     9     7  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7    11     9  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     9     7  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3    11     9  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     7     5  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     7     5  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     9     7  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

my_transformation(df_2)
#> # A tibble: 32 x 10
#>      mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21    160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21    160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

^{Created on 2020-08-22 by the reprex package (v0.3.0)}

As you can see, when calling my_transformation(df_1), there are two subcolumns: cyl$a and cyl$b. How do I get these to be regular columns?

I have tried unnest(cyl) but had no success.

I'm not sure what you're trying to do. There is no need to use `across` if you only `mutate` a single column. Can you elaborate? — Maurits Evers, Aug 23 '20 at 03:14
Sure. I'm trying to solve for the fact that my data frame will sometimes have a column, and other times will not. From this post: https://stackoverflow.com/questions/52998471/dynamically-determine-if-a-dataframe-column-exists-and-mutate-if-it-does I was hoping to use `across` and `any_of` to only mutate if a column exists. I will update the question to reflect that. — kmace, Aug 23 '20 at 03:20

score 0 · Answer 1 · answered Aug 23 '20 at 03:31

I think what you're after is something like

mtcars %>% mutate(across(cyl, list(a = ~ .x + 3, b = ~ .x + 1)))
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_a cyl_b
# 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4     9     7
# 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     9     7
# 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1     7     5
# 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1     9     7
# 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    11     9
# 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1     9     7
# ...

Note that the .fns argument of across can take a list of (lambda) functions; so if you replace function(x) tibble(a = ..., b = ...) with list(a = ~ ..., b = ~ ...) the new mutate (dplyr >= 1.0.0) will automatically create columns cyl_a and cyl_b.

score 0 · Answer 2 · answered Aug 23 '20 at 16:52

So the only way I've found to drop the "nesting" for dataframe columns is by not supplying an LHS argument to mutate as documented here

Unfortunately, using across to check for missing columns is not possible, as it uses .names to assign something on the LHS.

Therefore, I'm taking the approach of inserting the missing column if it is missing and then calling mutate without across.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

my_transformation = function(df){
  cols <- c(cyl = NA_real_)
  df %>% 
    add_column(!!!cols[!names(cols) %in% names(.)]) %>%
    mutate(tibble(a = cyl + 3, b = cyl + 1))
}

df_1 = as_tibble(mtcars)
df_2 = df_1 %>% select(-cyl)

my_transformation(df_1)
#> # A tibble: 32 x 13
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb     a     b
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     9     7
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     9     7
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     7     5
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     9     7
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    11     9
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     9     7
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    11     9
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     7     5
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     7     5
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     9     7
#> # … with 22 more rows

my_transformation(df_2)
#> # A tibble: 32 x 13
#>      mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb   cyl     a     b
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21    160    110  3.9   2.62  16.5     0     1     4     4    NA    NA    NA
#>  2  21    160    110  3.9   2.88  17.0     0     1     4     4    NA    NA    NA
#>  3  22.8  108     93  3.85  2.32  18.6     1     1     4     1    NA    NA    NA
#>  4  21.4  258    110  3.08  3.22  19.4     1     0     3     1    NA    NA    NA
#>  5  18.7  360    175  3.15  3.44  17.0     0     0     3     2    NA    NA    NA
#>  6  18.1  225    105  2.76  3.46  20.2     1     0     3     1    NA    NA    NA
#>  7  14.3  360    245  3.21  3.57  15.8     0     0     3     4    NA    NA    NA
#>  8  24.4  147.    62  3.69  3.19  20       1     0     4     2    NA    NA    NA
#>  9  22.8  141.    95  3.92  3.15  22.9     1     0     4     2    NA    NA    NA
#> 10  19.2  168.   123  3.92  3.44  18.3     1     0     4     4    NA    NA    NA
#> # … with 22 more rows

^{Created on 2020-08-23 by the reprex package (v0.3.0)}

Not a huge fan of the solution. but it does work. I'm considering creating a github issue for instances where you want to return an output data frame column, using mutate but only if an input column exists.

Unpack a data frame column into multiple columns

2 Answers2