0

I'm trying to take some data and clean it for end user visibility, however I'm new to R and can't quite seem to figure out how to go about it. Also, this is my first post, so please let me know if there are any formatting or structural issues to the way I wrote this question.

What the data looks like right now:

name date reason
john 1/1/2022 late
john 1/2/2022 late
john 1/4/2022 absent
betty 1/3/2022 absent
betty 1/5/2022 no call
betty 1/7/2022 no call
kyle 1/3/2022 absent
kyle 1/5/2022 no call
kyle 1/7/2022 no call

I want to see if there's a way to condense this so that for every name, you have the date and reason all on the same line. Like this:

name date1 reason1 date2 reason2 date3 reason3
john 1/1/2022 late 1/2/2022 late 1/4/2022 absent
betty 1/3/2022 absent 1/5/2022 no call 1/7/2022 no call
kyle 1/3/2022 absent 1/5/2022 no call 1/7/2022 no call

Alternatively, I tried using dcast, but my code produced numbers instead of dates.

new db <- dcast(db, name ~ reason, fun.aggregate = list, value.var = "date")

What I wanted:

name late absent no call
john 1/1/2022,1/2/2022 1/4/2022
betty 1/3/2022 1/5/2022,1/7/2022
kyle 1/3/2022 1/5/2022,1/7/2022

What I got:

name late absent no call
john c(1620708300,1627236300) 1639328820 numeric(0)
betty numeric(0) 1612973940 c(1611937080, 1612455480)
kyle numeric(0) 1639329540 c(1635526800, 1639760400)

EDIT:

I was able to get it inches away from where I need it to be using as.data.frame(pivot_wider(df, names_from=reason, values_from=date, values_fn=list, values_fill=list(""))) thanks to @Andre Wildberg, the last step that I need is to remove the c() from the cells and be able to display clean dates in those fields.

db<-structure(list(name = c("Debby", "Debby", "Debby", 
"Debby", "Robert", "Robert", "Robert", 
"Ryan", "Ryan", "Ryan", "Ryan", 
"Ryan", "Ryan", "Brandon", "Brandon"
), reason = c("Absent", "Leave Early", "Late", "Leave Early", 
"Leave Early", "Leave Early", "Absent", "Absent", "Absent", "Absent", 
"Absent", "Leave Early", "Late", "Leave Early", "Leave Early"
), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00", "2021-07-25                     
18:05:00", 
"2021-09-19 20:01:00", "2021-11-25 01:02:00", "2021-12-08 20:56:00", 
"2021-12-16 17:30:00", "2021-10-09 17:00:00", "2021-11-07 17:00:00", 
"2021-11-12 17:00:00", "2021-11-28 17:00:00", "2021-12-11 01:31:00", 
"2021-12-12 17:07:00", "2021-05-03 23:58:00", "2021-05-15 23:31:00"
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))
  • Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – NelsonGon Jan 12 '22 at 21:35

1 Answers1

0

Try this if you want to keep the observations combined

library(tidyr)

as.data.frame(pivot_wider(df, names_from=reason, values_from=date, 
  values_fn=list, values_fill=list("")))
     name
1   Debby
2  Robert
3    Ryan
4 Brandon
                                                                              Absent
1                                                                2021-05-11 04:45:00
2                                                                2021-12-16 17:30:00
3 2021-10-09 17:00:00, 2021-11-07 17:00:00, 2021-11-12 17:00:00, 2021-11-28 17:00:00
4                                                                                   
                               Leave Early                Late
1 2021-05-15 04:02:00, 2021-09-19 20:01:00 2021-07-25 18:05:00
2 2021-11-25 01:02:00, 2021-12-08 20:56:00                    
3                      2021-12-11 01:31:00 2021-12-12 17:07:00
4 2021-05-03 23:58:00, 2021-05-15 23:31:00 

Data

df <- structure(list(name = c("Debby", "Debby", "Debby", "Debby", "Robert", 
"Robert", "Robert", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan", 
"Brandon", "Brandon"), reason = c("Absent", "Leave Early", "Late", 
"Leave Early", "Leave Early", "Leave Early", "Absent", "Absent", 
"Absent", "Absent", "Absent", "Leave Early", "Late", "Leave Early", 
"Leave Early"), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00", 
"2021-07-25 18:05:00", "2021-09-19 20:01:00", "2021-11-25 01:02:00", 
"2021-12-08 20:56:00", "2021-12-16 17:30:00", "2021-10-09 17:00:00", 
"2021-11-07 17:00:00", "2021-11-12 17:00:00", "2021-11-28 17:00:00", 
"2021-12-11 01:31:00", "2021-12-12 17:07:00", "2021-05-03 23:58:00", 
"2021-05-15 23:31:00")), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29