0

I have the following dataframe

FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")

df<-data.frame(FileNumber,EventCode,EventDate)
  FileNumber EventCode             EventDate
1  510708396    CASCRT 8/21/2018 12:00:00 AM
2  510708396  DISCSENT  12/3/2018 2:41:18 PM
3  510708396  DISCSENT  12/3/2018 3:50:16 PM
4  510708485    CASCRT 8/23/2018 12:00:00 AM
5  510667325  DISCSENT 12/12/2018 9:11:28 AM

I want to change this long format dataframe into a wide format data with using EventCodes CASRT and DISCSENT as the column names. I tried the following

library(reshape2)
dcast(df,FileNumber~EventCode,value.var = "EventDate")

however I recieve the following and a message that "Aggregation function missing: defaulting to length" where as I was expecting the EventDate values.

  FileNumber CASCRT DISCSENT
1  510667325      0        1
2  510708396      1        2
3  510708485      1        0

I'm guessing this has something do to do with the non-unique values in the FileNumber how do I make sure that I get the Event Date values instead of 1's and 0's.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Nathan123
  • 763
  • 5
  • 18

1 Answers1

0

You get this error because there are multiple rows with same EventNumber and EventCode. When trying to cast the data into wide format, reshape does not know how to handle multiple values and uses its fallback solution which is lenght (i.e. counting how many elements there are in this cell)

You need to decide how you want to proceed in the case where there are more than value per cell.

You could transform the EventDate column to date-time format, so that you can compute the mean value. Or use only the max or min.

If you want to keep each date in a list, I'd highly suggest using tidyr s pivot_wider function:

FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")

df<-data.frame(FileNumber,EventCode,EventDate)

library(dplyr)
library(tidyr)

df2 <- df %>%
  pivot_wider(names_from = EventCode,
              values_from = EventDate)

This raises a warning, but puts the multiple elements in a list:

df2 is now:

# A tibble: 3 x 3
  FileNumber      CASCRT    DISCSENT
  <fct>      <list<fct>> <list<fct>>
1 510708396          [1]         [2]
2 510708485          [1]         [0]
3 510667325          [0]         [1]

And we can access the elements in the list:

df2$DISCSENT[1]

Returns:

list_of<factor<b7763>>[1]>
[[1]]
[1] 12/3/2018 2:41:18 PM 12/3/2018 3:50:16 PM
5 Levels: 12/12/2018 9:11:28 AM ... 8/23/2018 12:00:00 AM
dario
  • 6,415
  • 2
  • 12
  • 26