2

I would like to take a feature and spread it's values as columns with 1/0 if true/false e.g.

mtcars %>% 
  pivot_wider(names_from = cyl,
              values_from = 1)

This appears to have done something, cyl has now been spread into columns except the values are things like 21, 21.4 or NA.

> mtcars %>% 
+   pivot_wider(names_from = cyl,
+               values_from = 1)
# A tibble: 32 x 12
    disp    hp  drat    wt  qsec    vs    am  gear  carb   `6`   `4`   `8`
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  160    110  3.9   2.62  16.5     0     1     4     4  21    NA    NA  
 2  160    110  3.9   2.88  17.0     0     1     4     4  21    NA    NA  
 3  108     93  3.85  2.32  18.6     1     1     4     1  NA    22.8  NA  
 4  258    110  3.08  3.22  19.4     1     0     3     1  21.4  NA    NA  
 5  360    175  3.15  3.44  17.0     0     0     3     2  NA    NA    18.7
 6  225    105  2.76  3.46  20.2     1     0     3     1  18.1  NA    NA  
 7  360    245  3.21  3.57  15.8     0     0     3     4  NA    NA    14.3
 8  147.    62  3.69  3.19  20       1     0     4     2  NA    24.4  NA  
 9  141.    95  3.92  3.15  22.9     1     0     4     2  NA    22.8  NA  
10  168.   123  3.92  3.44  18.3     1     0     4     4  19.2  NA    NA 

I tried using values_fill like so:

> mtcars %>% 
+   pivot_wider(names_from = cyl,
+               values_from = 1,
+               values_fill = list(1 = 0))
Error: unexpected '=' in:
"              values_from = 1,
              values_fill = list(1 ="

How can I spread cyl across columns with binary 1 or 0 values depending on whether or not the cyl is 4, 6 or 8?

Is pivot_wider() what I want?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • 1
    ```values_from = 1``` gets the values from the first column, i.e. `mpg`. 21, 21.4, etc. that you see are from `mpg`!!! – M-- Nov 11 '19 at 23:18
  • So like make dummy variables from `cyl`? – camille Nov 11 '19 at 23:25
  • Couple posts that seem related: https://stackoverflow.com/q/35663580/5325862, https://stackoverflow.com/q/49276914/5325862 – camille Nov 11 '19 at 23:31

2 Answers2

1

Set mpg to 1 and set the fill for mpg to 0 like this:

mtcars %>%
  mutate(mpg = 1) %>%
  pivot_wider(names_from = cyl, values_from = mpg, values_fill = list(mpg = 0))
## # A tibble: 32 x 12
##     disp    hp  drat    wt  qsec    vs    am  gear  carb   `6`   `4`   `8`
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  160    110  3.9   2.62  16.5     0     1     4     4     1     0     0
##  2  160    110  3.9   2.88  17.0     0     1     4     4     1     0     0
##  3  108     93  3.85  2.32  18.6     1     1     4     1     0     1     0
## ... etc ...

or given the problem that pivot_wider currently has with ordering the columns you may prefer the older spread:

mtcars %>%
  mutate(mpg = 1) %>%
  spread(cyl, mpg, fill = 0)
##     disp  hp drat    wt  qsec vs am gear carb 4 6 8
## 1   71.1  65 4.22 1.835 19.90  1  1    4    1 1 0 0
## 2   75.7  52 4.93 1.615 18.52  1  1    4    2 1 0 0
## 3   78.7  66 4.08 2.200 19.47  1  1    4    1 1 0 0
## ... etc ...

Alternately specify values_fn like this:

mtcars %>%
  pivot_wider(names_from = cyl, values_from = mpg, 
    values_fn = list(mpg = ~ 1), values_fill = list(mpg = 0))
## # A tibble: 32 x 12
##     disp    hp  drat    wt  qsec    vs    am  gear  carb   `6`   `4`   `8`
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  160    110  3.9   2.62  16.5     0     1     4     4     1     0     0
##  2  160    110  3.9   2.88  17.0     0     1     4     4     1     0     0
##  3  108     93  3.85  2.32  18.6     1     1     4     1     0     1     0
## ...etc...
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Although I'm reading the docs page for pivot_wider, I find this hard to follow: `values_fn = list(mpg = ~ 1), values_fill = list(mpg = 0)`. What exactly is happening there? Especially confused by the tilde ~ – Doug Fir Nov 11 '19 at 23:40
  • 1
    `values_fn` takes a named list with the names being the column names that are to be used. In this case we want to apply the function to each value in the `mpg` column so we used that as the name. `~ 1` is a shorthand for `function(x) 1`, i.e. a function which returns 1. This notation is supported by many tidyverse functions. – G. Grothendieck Nov 11 '19 at 23:59
1

An option would be to use the names and values from cyl and then recode this based on is.na:

mtcars %>% 
  pivot_wider(names_from = cyl,
              values_from = cyl) %>% 
  mutate_at(vars(!!!syms(as.character(unique(mtcars$cyl)))), ~if_else(is.na(.), 0, 1))

# A tibble: 32 x 13
#     mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb   `6`   `4`   `8`
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  21    160    110  3.9   2.62  16.5     0     1     4     4     1     0     0
# 2  21    160    110  3.9   2.88  17.0     0     1     4     4     1     0     0
# 3  22.8  108     93  3.85  2.32  18.6     1     1     4     1     0     1     0
# 4  21.4  258    110  3.08  3.22  19.4     1     0     3     1     1     0     0
# 5  18.7  360    175  3.15  3.44  17.0     0     0     3     2     0     0     1
# 6  18.1  225    105  2.76  3.46  20.2     1     0     3     1     1     0     0
# 7  14.3  360    245  3.21  3.57  15.8     0     0     3     4     0     0     1
# 8  24.4  147.    62  3.69  3.19  20       1     0     4     2     0     1     0
# 9  22.8  141.    95  3.92  3.15  22.9     1     0     4     2     0     1     0
#10  19.2  168.   123  3.92  3.44  18.3     1     0     4     4     1     0     0
Matt
  • 2,947
  • 1
  • 9
  • 21