0

I have a question about matrix structure manipulation in R, here I need to first transpose the matrix and combine the month and status columns, filling the missing values with 0. Here I have an example, currently my data is like belows. It seems very tricky. I would appreciate if anyone could help on this. Thank you.

Hi, my data looks like the follows:

  structure(list(Customer = c("1096261", "1096261", "1169502", 
    "1169502"), Phase = c("2", "3", "1", "2"), Status = c("Ontime", 
    "Ontime", "Ontime", "Ontime"), Amount = c(21216.32, 42432.65, 
    200320.05, 84509.24)), .Names = c("Customer", "Phase", "Status", 
    "Amount"), row.names = c(NA, -4L), class = c("grouped_df", "tbl_df", 
    "tbl", "data.frame"), vars = c("Customer", "Phase"), drop = TRUE, indices 
    = list(
    0L, 1L, 2L, 3L), group_sizes = c(1L, 1L, 1L, 1L), biggest_group_size = 1L, 
    labels = structure(list(
    Customer = c("1096261", "1096261", "1169502", "1169502"), 
    Phase = c("2", "3", "1", "2")), row.names = c(NA, -4L), class = 
    "data.frame", vars = c("Customer", 
    "Phase"), drop = TRUE, .Names = c("Customer", "Phase")))   

I need to have the reshaped matrix with the following columns:
Customer Phase1earlyTotal Phase2earlyTotal....Phase4earlyTotal...Phase1_ Ontimetotal...Phase4_Ontimetotal...Phase1LateTotal_Phase4LateTotal. For example Phase1earlytotal includes the sum of the amount with the Phase=1 and Status=Early.

Currently I use the following scripts, which does not work, coz I dont know how to combine Phase and Stuatus Column.

   mydata2<-data.table(mydata2,V3,V4)
    mydata2$V4<-NULL
    datacus <- data.frame(mydata2[-1,],stringsAsFactors = F); 
    datacus <- datacus %>% mutate(Phase= as.numeric(Phase),Amount= 
   as.numeric(Amount)) %>%
   complete(Phase = 1:4,fill= list(Amount = 0)) %>% 
   dcast(datacus~V3, value.var = 'Amount',fill = 0) %>% select(Phase, V3) 
   %>%t()
Cherry
  • 73
  • 6
  • 2
    Hi, please [provide code and data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – jay.sf Jan 29 '19 at 10:35
  • You may use `dcast` from `reshape2` or `data.table` – akrun Jan 29 '19 at 10:36
  • Hi, my data looks like the follows: Customer Phase Status Amount 1 1096261 2 Ontime 21216. 2 1096261 3 Ontime 42433. 3 1169502 1 Ontime 200320. 4 1169502 2 Ontime 84509. 5 1169502 3 Ontime 863940. 6 1172386 1 Ontime 467078. – Cherry Jan 29 '19 at 10:40
  • 1
    @Cherry Please update your question and include the output of `dput(your_data)` there. See the link posted by @jay.sf for a reference. – markus Jan 29 '19 at 10:45

1 Answers1

0

I believe you are looking for somethink like this?

sample data

df <- structure(list(Customer = c("1096261", "1096261", "1169502", 
                            "1169502"), Phase = c("2", "3", "1", "2"), Status = c("Ontime", 
                                                                                  "Ontime", "Ontime", "Ontime"), Amount = c(21216.32, 42432.65, 
                                                                                                                            200320.05, 84509.24)), .Names = c("Customer", "Phase", "Status", 
                                                                                                                                                              "Amount"), row.names = c(NA, -4L), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                           "tbl", "data.frame"), vars = c("Customer", "Phase"), drop = TRUE, indices 
          = list(
            0L, 1L, 2L, 3L), group_sizes = c(1L, 1L, 1L, 1L), biggest_group_size = 1L, 
          labels = structure(list(
            Customer = c("1096261", "1096261", "1169502", "1169502"), 
            Phase = c("2", "3", "1", "2")), row.names = c(NA, -4L), class = 
              "data.frame", vars = c("Customer", 
                                     "Phase"), drop = TRUE, .Names = c("Customer", "Phase")))   

#    Customer Phase Status    Amount
# 1:  1096261     2 Ontime  21216.32
# 2:  1096261     3 Ontime  42432.65
# 3:  1169502     1 Ontime 200320.05
# 4:  1169502     2 Ontime  84509.24

code

library( data.table )
dcast( setDT( df ), Customer ~ Phase + Status, fun = sum, value.var = "Amount" )[]

output

#    Customer 1_Ontime 2_Ontime 3_Ontime
# 1:  1096261        0 21216.32 42432.65
# 2:  1169502   200320 84509.24     0.00
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Hi, thank you for the reply. Actually I need 12 columns, Customer 1_Ontime, 2_Ontime, 3_Ontime, 4 Ontime, 1 Late, 2 Late, 3 Late, 4 Late, 1 Early, 2 Early, 3 Early, 4 Early, could you pls give some advice on that? – Cherry Jan 30 '19 at 03:36
  • Thank you very much for your help. I think it is a great way to solve the problem. – Cherry Jan 30 '19 at 06:01
  • @Cherry As you already may have noticed, the extra columns you desire are created automatically, based on the unique values in the `Phase` and `Status` columns... Since the sample-data only contains 3 unique phases, and 1 unique status, you'll end up with Customer + 3 * 1 columns. – Wimpel Jan 30 '19 at 07:22
  • Yes, exactly. I noticed that after replying to your comments, now it is perfectly working. Thanks a lot for the help! – Cherry Jan 30 '19 at 09:19