R: Unstack a column with dates

Question

suppose I have the following dataframe:

df <- data.frame(Order=c("1234567","1234567","1234567","456789","456789"),Stage=c("Pipeline","Proposal","Closed","Pipeline","Lost"),StageChange=c("2008-01-01","2008-01-02","2008-01-03","2008-01-10","2008-01-12"))

Resulting in:

    head(df)
    Order    Stage StageChange
1 1234567 Pipeline  2008-01-01
2 1234567 Proposal  2008-01-02
3 1234567   Closed  2008-01-03
4  456789 Pipeline  2008-01-10
5  456789     Lost  2008-01-12

I need to unstack the "Stage" column and get to a dataframe like this:

    Order   Pipeline   Proposal     Closed       Lost
1 1234567 2008-01-01 2008-01-02 2008-01-03         NA
2  456789 2008-01-10         NA         NA 2008-01-12

I read the documentation and tried different approaches with dplyr and tidyr (like in this thread), but my ignorance is winning.

Any thoughts on to accomplish what I need?

My objective, to make it clear, is to use this data to calculate the number of days a particular Order spent on a specific Stage. Some orders are Lost, others are Closed (Won) and this is why there are "NA" values. Same happens when an order didn't change to a specific stage (an order can go from Pipeline to Lost, without any change to intermediary stages).

Thanks!

Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — divibisan, Mar 30 '20 at 17:09
@divibisan thanks for the link, but the pivot_wider function looks a better way to go as it is an evolution from spread(). Cheers! — Ricardo Costa, Mar 30 '20 at 18:21

nurandi · Accepted Answer · 2020-03-30T18:22:05.270

2

You might use tidyr::pivot_wider. That's new version of retired-function spread

# install.packages("tidyr")
library(tidyr)

df %>%
  pivot_wider(names_from = Stage, values_from = StageChange)

# # A tibble: 2 x 5
#   Order   Pipeline   Proposal   Closed     Lost      
#   <fct>   <fct>      <fct>      <fct>      <fct>     
# 1 1234567 2008-01-01 2008-01-02 2008-01-03 NA        
# 2 456789  2008-01-10 NA         NA         2008-01-12

edited Mar 30 '20 at 18:22

answered Mar 30 '20 at 17:07

nurandi

1,588
1
11
20

1

Thanks! Didn't know about this function. Worked like a charm! – Ricardo Costa Mar 30 '20 at 18:16

TheSciGuy · Answer 2 · 2020-03-30T17:22:33.357

Dates will be a factor class

library(tidyverse)

df_wide <- df %>%
  tidyr::pivot_wider(names_from = Stage, values_from = StageChange)

df_wide

# A tibble: 2 x 5
  Order   Pipeline   Proposal   Closed     Lost      
  <fct>   <fct>      <fct>      <fct>      <fct>     
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA        
2 456789  2008-01-10 NA         NA         2008-01-12

In case you wanted to convert your dates to Date class

df_wide_dates <- df %>%
  tidyr::pivot_wider(names_from = Stage, values_from = StageChange) %>%
  dplyr::mutate_at(., vars(Pipeline, Proposal, Closed, Lost), as.Date)

df_wide_dates

# A tibble: 2 x 5
  Order   Pipeline   Proposal   Closed     Lost      
  <fct>   <date>     <date>     <date>     <date>    
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA        
2 456789  2008-01-10 NA         NA         2008-01-12

Thanks a lot! I gave the 'answered' flag to @nurandi as it seems his answer with pivot_wider came first. — Ricardo Costa, Mar 30 '20 at 18:19

score 0 · Answer 3 · answered Mar 30 '20 at 17:09

0

using dplyr::spread

library(dplyr)

df %>% 
  spread(Stage,StageChange) %>% 
  select(Order,Pipeline,Proposal,Closed,Lost)

answered Mar 30 '20 at 17:09

ava

840
5
19

1

To stay up to date use `pivot_wider()` instead of `spread()`: "Development on spread() is complete, and for new code we recommend switching to pivot_wider(), which is easier to use, more featureful, and still under active development. df %>% spread(key, value) is equivalent to df %>% pivot_wider(names_from = key, values_from = value)" – TheSciGuy Mar 30 '20 at 17:16

R: Unstack a column with dates

3 Answers3