0

I have data for Loan instalment repayment by different customers.

The data includes basic details of customers like Name, Age, Salary, region, credit score etc.

There are some time variant fields as well like outstanding balance which is updated every month (but I am not really concerned with this data).

In my data set I have 6 rows for every customer (1 for each month till past 6 months). Each row contains a column called status (paid/not paid).

Example dataset looks as below:

Customer
Number      Age   Balance  Status
A123        34    1000     Paid
A123        34    2000     UnPaid
A123        34    1500     UnPaid
A123        34    1400     Paid
A123        34    1300     Paid
A123        34    1400     UnPaid
A234        36    1000     Paid
A234        36    4000     Paid
A234        36    3000     UnPaid
A234        36    5000     Paid
A234        36    6000     UnPaid
A234        36    1700     Paid

I want to convert the data in below format:

Customer
Number      Age Balance  Status1    Status2 Status3 Status4   Status5 Status6
A123        34  1000     Paid       UnPaid  UnPaid   Paid     Paid    UnPaid
A234        36  1000     Paid       Paid    UnPaid   Paid     UnPaid  Paid

Please note that, I am just interested in the latest values of other columns (Age, Balance etc.)

I have tried the casting() function but it did not do what I was after.

Can anyone help with this?

morgan121
  • 2,213
  • 1
  • 15
  • 33
SKB
  • 189
  • 1
  • 13
  • 1
    Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – morgan121 Dec 05 '19 at 05:24

1 Answers1

1

We can select the first value of Balance for each Number and create a column to assign unique Status number and then get the data in wide format.

library(dplyr)

df %>%
  group_by(Number) %>%
  mutate(Balance = first(Balance), 
         col = paste0("Status", row_number())) %>%
  tidyr::pivot_wider(names_from = col, values_from = Status)

#  Number   Age Balance Status1 Status2 Status3 Status4 Status5 Status6
#  <fct>  <int>   <int> <fct>   <fct>   <fct>   <fct>   <fct>   <fct>  
#1 A123      34    1000 Paid    UnPaid  UnPaid  Paid    Paid    UnPaid 
#2 A234      36    1000 Paid    Paid    UnPaid  Paid    UnPaid  Paid   

data

df <- structure(list(Number = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A123", "A234"), class = "factor"), 
Age = c(34L, 34L, 34L, 34L, 34L, 34L, 36L, 36L, 36L, 36L, 
36L, 36L), Balance = c(1000L, 2000L, 1500L, 1400L, 1300L, 
1400L, 1000L, 4000L, 3000L, 5000L, 6000L, 1700L), Status = structure(c(1L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Paid", 
"UnPaid"), class = "factor")), class = "data.frame", row.names = c(NA, -12L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213