-1

I've been trying to reshape this long data into wide format by ID within excel with no luck. I tried using dcast but it did not give me the results I was expecting.

I've attached a csv of how the data is currently formatted in cells: (a1:c10), and cells: (f1:n4) is how I'd like it to be formatted. I tried in excel first, but have no experience with power query and thought perhaps reshape2 or dcast could do something similar.

csv screenshot

In r I did:

olddata_wide$ID <- factor(olddata_wide$ID)

widedf <- dcast(df1, ID  ~ paydate, value.var="Type")

This just gave me an output of dates.

zephryl
  • 14,633
  • 3
  • 11
  • 30
willr
  • 7
  • 1
  • 1
    `df %>% dplyr::mutate(trans_no = row_number(), .by = ID) %>% tidyr::pivot_wider(names_from = trans_no, values_from = paydate:type)`. Note you’ll have to update to dplyr >=1.1.0. See [this thread](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) for more on reshaping from long to wide, and [this](https://stackoverflow.com/q/2185252/17303805) for wide to long. – zephryl Feb 25 '23 at 01:22
  • 1
    Also for your next question, please either post data in a copy-pasteable form, eg using `dput()`, or use a built-in dataset. See [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/17303805). – zephryl Feb 25 '23 at 01:27

1 Answers1

-1

Using pivot_wider and rename

library(dplyr)
library(tidyr)

repl <- c("1st_transaction" = "type_1", "2nd_transaction" = "type_2", 
  "3rd_transaction" = "type_3", "4th_transaction" = "type_4")

df %>% 
  mutate(n = row_number(), .by = ID) %>% 
  pivot_wider(names_from = n, values_from = c(type, paydate)) %>% 
  rename(all_of(repl))
# A tibble: 3 × 9
  ID     1st_transacti…¹ 2nd_t…² 3rd_t…³ 4th_t…⁴ payda…⁵ payda…⁶ payda…⁷ payda…⁸
  <chr>  <chr>           <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 AAL100 H               H       B       NA      8/28/2… 8/28/2… 8/28/2… NA     
2 AAC926 H               H       NA      NA      8/28/2… 8/28/2… NA      NA     
3 ABR765 V               H       H       B       8/17/2… 8/28/2… 8/28/2… 8/28/2…
# … with abbreviated variable names ¹​`1st_transaction`, ²​`2nd_transaction`,
#   ³​`3rd_transaction`, ⁴​`4th_transaction`, ⁵​paydate_1, ⁶​paydate_2, ⁷​paydate_3,
#   ⁸​paydate_4

Data

df <- structure(list(ID = c("AAL100", "AAL100", "AAL100", "AAC926", 
"AAC926", "ABR765", "ABR765", "ABR765", "ABR765"), paydate = c("8/28/2019", 
"8/28/2020", "8/28/2021", "8/28/2017", "8/28/2018", "8/17/2016", 
"8/28/2020", "8/28/2021", "8/28/2022"), type = c("H", "H", "B", 
"H", "H", "V", "H", "H", "B")), class = "data.frame", row.names = c(NA, 
-9L))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • Hey Sorry for just getting back to you. This was super helpful & cant thank you enough! – willr Mar 01 '23 at 17:15
  • Thx for mentioning! Glad it helped! Even though it was marked as a duplicate question the renaming element makes it unique imho, so if you wish you can mark it as solved, it might help others too. – Andre Wildberg Mar 01 '23 at 18:54