I'm trying to take some data and clean it for end user visibility, however I'm new to R and can't quite seem to figure out how to go about it. Also, this is my first post, so please let me know if there are any formatting or structural issues to the way I wrote this question.
What the data looks like right now:
name | date | reason |
---|---|---|
john | 1/1/2022 | late |
john | 1/2/2022 | late |
john | 1/4/2022 | absent |
betty | 1/3/2022 | absent |
betty | 1/5/2022 | no call |
betty | 1/7/2022 | no call |
kyle | 1/3/2022 | absent |
kyle | 1/5/2022 | no call |
kyle | 1/7/2022 | no call |
I want to see if there's a way to condense this so that for every name, you have the date and reason all on the same line. Like this:
name | date1 | reason1 | date2 | reason2 | date3 | reason3 |
---|---|---|---|---|---|---|
john | 1/1/2022 | late | 1/2/2022 | late | 1/4/2022 | absent |
betty | 1/3/2022 | absent | 1/5/2022 | no call | 1/7/2022 | no call |
kyle | 1/3/2022 | absent | 1/5/2022 | no call | 1/7/2022 | no call |
Alternatively, I tried using dcast
, but my code produced numbers instead of dates.
new db <- dcast(db, name ~ reason, fun.aggregate = list, value.var = "date")
What I wanted:
name | late | absent | no call |
---|---|---|---|
john | 1/1/2022,1/2/2022 | 1/4/2022 | |
betty | 1/3/2022 | 1/5/2022,1/7/2022 | |
kyle | 1/3/2022 | 1/5/2022,1/7/2022 |
What I got:
name | late | absent | no call |
---|---|---|---|
john | c(1620708300,1627236300) | 1639328820 | numeric(0) |
betty | numeric(0) | 1612973940 | c(1611937080, 1612455480) |
kyle | numeric(0) | 1639329540 | c(1635526800, 1639760400) |
EDIT:
I was able to get it inches away from where I need it to be using as.data.frame(pivot_wider(df, names_from=reason, values_from=date, values_fn=list, values_fill=list("")))
thanks to @Andre Wildberg, the last step that I need is to remove the c() from the cells and be able to display clean dates in those fields.
db<-structure(list(name = c("Debby", "Debby", "Debby",
"Debby", "Robert", "Robert", "Robert",
"Ryan", "Ryan", "Ryan", "Ryan",
"Ryan", "Ryan", "Brandon", "Brandon"
), reason = c("Absent", "Leave Early", "Late", "Leave Early",
"Leave Early", "Leave Early", "Absent", "Absent", "Absent", "Absent",
"Absent", "Leave Early", "Late", "Leave Early", "Leave Early"
), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00", "2021-07-25
18:05:00",
"2021-09-19 20:01:00", "2021-11-25 01:02:00", "2021-12-08 20:56:00",
"2021-12-16 17:30:00", "2021-10-09 17:00:00", "2021-11-07 17:00:00",
"2021-11-12 17:00:00", "2021-11-28 17:00:00", "2021-12-11 01:31:00",
"2021-12-12 17:07:00", "2021-05-03 23:58:00", "2021-05-15 23:31:00"
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))