Convert long form data frame with multiple variables to wide format tidy data

Question

Wanted to check if we can cast multiple columns . For example, as shown below

Data:

structure(list(Date = c("03-01-2016", "04-01-2016", "05-01-2016", 
"06-01-2016", "07-01-2016", "09-01-2016"), Cat = c("A", "A", 
"B", "D", "D", "D"), Sales = c(39L, 28L, 50L, 40L, 39L, 41L), 
    products = c(3L, 6L, 4L, 2L, 1L, 5L)), class = "data.frame", row.names = c(NA, 
-6L))

Date       Cat  Sales   products
03-01-2016  A   39        3
04-01-2016  A   28        6
05-01-2016  B   50        4
06-01-2016  D   40        2
07-01-2016  D   39        1
09-01-2016  D   41        5

Expected Output

Date        Sales _A    Sales _B    Sales _C    Products _A Products _B Products _C
03-01-2016    39           0          0              3          0          0
04-01-2016    28           0          0              6          0          0
05-01-2016     0          50           0             0          4           0
06-01-2016     0          0           40             0          0           2
07-01-2016     0          0           39             0          0           1 
09-01-2016     0          0           41             0          0           5

Len Greski · Answer 1 · 2020-08-08T16:27:28.680

This can be accomplished with pivot_wider() function from version 1.0.0 or newer of the tidyr package.

First, we rename the products column to Products. Then we use pivot_wider() arguments to assign column names from the Cat column of the input data frame, values from the Products and Sales columns, use names_glue to set {.value}_{Cat} as the column names, and fill missing values with 0.

textData <- "Date       Cat  Sales   products
03-01-2016  A   39        3
04-01-2016  A   28        6
05-01-2016  B   50        4
06-01-2016  D   40        2
07-01-2016  D   39        1
09-01-2016  D   41        5"

data <- read.table(text = textData,header=TRUE)
library(tidyr)

data %>% rename(Products = products) %>% 
     pivot_wider(.,Date,names_from=Cat,
                     values_from=c(Sales,Products),
                     names_glue="{.value}_{Cat}",
                     values_fill=0)

...and the output:

# A tibble: 6 x 7
  Date       Sales_A Sales_B Sales_D Products_A Products_B Products_D
  <chr>        <int>   <int>   <int>      <int>      <int>      <int>
1 03-01-2016      39       0       0          3          0          0
2 04-01-2016      28       0       0          6          0          0
3 05-01-2016       0      50       0          0          4          0
4 06-01-2016       0       0      40          0          0          2
5 07-01-2016       0       0      39          0          0          1
6 09-01-2016       0       0      41          0          0          5
>

Explaining names_glue

The tidyverse uses the glue package to interpret string literals, evaluate and insert them into argument strings of R functions. tidyr combines glue with a pivot specification to determine exactly how to name columns when pivoting a data frame from narrow format to wide, or vise versa.

Per the tidyr documentation, names_glue="{.value}_{Cat}" is interpreted to generate a specification data frame that controls how columns in the output data frame are named.

The pivot spec is a data frame with one row for each column in the wide format version of the data that is not present in the long format, and two special columns that start with .: .name which gives the name of the column, and .value provides the name of the column that the values in the cells will go into.

For the pivot_wider() call in this question, the pivot spec data frame looks like this.

# A tibble: 6 x 3
  .name      .value   Cat  
  <chr>      <chr>    <chr>
1 Sales_A    Sales    A    
2 Sales_B    Sales    B    
3 Sales_D    Sales    D    
4 Products_A Products A    
5 Products_B Products B    
6 Products_D Products D    
>

NOTE: in the output data frame I did not include blank spaces in the column names as illustrated in the question because they are a hassle to use in R. Also note that the question has a Sales_C output column when the raw data does not have a C category. Instead, it has a D category.

Also NOTE: tidyr 1.0.0 was introduced in October 2019. tidyr::pivot_wider() replaces the tidyr::spread() function, which has been deprecated as of release 1.0.0.

i tried your solution not sure why it doesn;'t work in my data-frame but the one you have used works perfectly fine. I used @ThomasIsCoding solution that works fine .. — kcm, Sep 28 '20 at 12:14
@krushnachChandra - do you receive an error message? If so, what was the error? If not, what happened? If you would like help debugging your code, please post a new question that includes a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), along with any error messages that were generated by the code. — Len Greski, Sep 28 '20 at 13:35
"please post a new question " I wish i could but i been blocked from asking question in this R section but I will post my question in other section and will share the link here — kcm, Sep 28 '20 at 19:46

score 1 · Answer 2 · answered Aug 08 '20 at 14:04

Here is a base R option using reshape

reshape(df,direction = "wide",idvar = "Date",timevar = "Cat")

which gives

> reshape(df,direction = "wide",idvar = "Date",timevar = "Cat")
        Date Sales.A products.A Sales.B products.B Sales.D products.D
1 03-01-2016      39          3      NA         NA      NA         NA
2 04-01-2016      28          6      NA         NA      NA         NA
3 05-01-2016      NA         NA      50          4      NA         NA
4 06-01-2016      NA         NA      NA         NA      40          2
5 07-01-2016      NA         NA      NA         NA      39          1
6 09-01-2016      NA         NA      NA         NA      41          5

Data

df <- structure(list(Date = c("03-01-2016", "04-01-2016", "05-01-2016", 
"06-01-2016", "07-01-2016", "09-01-2016"), Cat = c("A", "A", 
"B", "D", "D", "D"), Sales = c(39L, 28L, 50L, 40L, 39L, 41L), 
    products = c(3L, 6L, 4L, 2L, 1L, 5L)), class = "data.frame", row.names = c(NA, 
-6L))

The values in `Cat` are as suffix in the column names of expected output — ThomasIsCoding, Aug 09 '20 at 07:42

Convert long form data frame with multiple variables to wide format tidy data

2 Answers2

Explaining names_glue