-1

I am looking for a way to reshape my data:

> test
      policyID startYear   product
1: G000246-000      2014 Product 1
2: G000246-000      2014 Product 2
3: G000246-000      2014 Product 3
4: G000246-000      2015 Product 1
5: G000246-000      2015 Product 2
6: G000246-000      2015 Product 3

to this:

     policyID       2014         2015
1: G000246-000    Product 1    Product 1
2: G000246-000    Product 2    Product 2
3: G000246-000    Product 3    Product 3

I have tried:

  reshape(test, idvar = "policyID", timevar = "startYear", direction = "wide")

but i get:

      policyID product.2014 product.2015
1: G000246-000    Product 1    Product 1

Whats the best approach to get to my desired results?

DATA:

structure(list(policyID = c("G000246-000", "G000246-000", "G000246-000", 
"G000246-000", "G000246-000", "G000246-000"), startYear = c(2014, 
2014, 2014, 2015, 2015, 2015), product = c("Product 1", "Product 2", 
"Product 3", "Product 1", "Product 2", "Product 3")), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))
s_baldur
  • 29,441
  • 4
  • 36
  • 69
Nneka
  • 1,764
  • 2
  • 15
  • 39
  • 1
    Since you use `data.table`, try `dcast(test, policyID + rowid(startYear) ~ startYear)`. In your data you miss a unique identifier. – markus Jul 02 '20 at 12:50
  • @markus the startyear gets modified to something else – Nneka Jul 02 '20 at 12:58

2 Answers2

2

A tidyr solution, although obtaining a warning message because there is no unique identifier in you dataset, would be

library(tidyr)

test %>% 
  pivot_wider(policyID, names_from = startYear, values_from = product) %>%
  unnest(starts_with("2"))   # or unnest(everything()) ; it depends on which are your other columns

# A tibble: 3 x 3
#   policyID    `2014`    `2015`   
#   <chr>       <chr>     <chr>    
# 1 G000246-000 Product 1 Product 1
# 2 G000246-000 Product 2 Product 2
# 3 G000246-000 Product 3 Product 3
Ric S
  • 9,073
  • 3
  • 25
  • 51
  • this unfortunately doesnt produce any results: 'In addition: Warning message: Values in `product` are not uniquely identified; output will contain list-cols. * Use `values_fn = list(product = list)` to suppress this warning. * Use `values_fn = list(product = length)` to identify where the duplicates arise * Use `values_fn = list(product = summary_fun)` to summarise duplicates ' – Nneka Jul 02 '20 at 13:00
  • With the `test` sample data you provided at the beginning of your question (without the `.internal.selfref` argument), I obtain the output shown in my answer. My `tidyr` version is `1.1.0` and R `4.0.0`. Maybe they are different from yours. Furthermore, I specified that you obtain a warning message because you have no unique identifier in your dataset, so it's normal that the warning message appears. – Ric S Jul 02 '20 at 13:05
1

Very similar to markus' comment:

test[, dcast(.SD, policyID + product ~ startYear, value.var = "product")
     ][, !"product"]

      policyID      2014      2015
1: G000246-000 Product 1 Product 1
2: G000246-000 Product 2 Product 2
3: G000246-000 Product 3 Product 3

Data

test <- data.table(
  policyID = c("G000246-000"), 
  startYear = rep(c(2014,2015), each = 3), 
  product = paste("Product", 1:3)
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69