1

I've got a data set that looks like this:

date, location, value, tally, score
2016-06-30T09:30Z, home, foo, 1,
2016-06-30T12:30Z, work, foo, 2,
2016-06-30T19:30Z, home, bar, , 5

I need to aggregate these rows together, to obtain a result such as:

date, location, value, tally, score
2016-06-30, [home, work], [foor, bar], 3, 5

There are several challenges for me:

  • The resulting row (a daily aggregate) must include the rows for this day (2016-06-30 in my above example
  • Some rows (strings) will result in an array containing all the values present on this day
  • Some others (ints) will result in a sum

I've had a look at dplyr, and if possible I'd like to do this in R.

Thanks for your help!


Edit:

Here's a dput of the data

structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z", 
"2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"), 
    location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
    ), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar", 
    "foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA, 
    NA, 5L)), .Names = c("date", "location", "value", "tally", 
"score"), class = "data.frame", row.names = c(NA, -3L))
Hack-R
  • 22,422
  • 14
  • 75
  • 131
basbabybel
  • 780
  • 8
  • 17

1 Answers1

1
mydat<-structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z", 
                                                       "2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"), 
                      location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
                      ), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar", 
                                                                                        "foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA, 
                                                                                                                                                    NA, 5L)), .Names = c("date", "location", "value", "tally", 
                                                                                                                                                                         "score"), class = "data.frame", row.names = c(NA, -3L))

mydat$date <- as.Date(mydat$date)

require(data.table)
mydat.dt <- data.table(mydat)
mydat.dt <- mydat.dt[, lapply(.SD, paste0, collapse=" "), by = date]

cbind(mydat.dt, aggregate(mydat[,c("tally", "score")], by=list(mydat$date), FUN = sum, na.rm=T)[2:3])

which gives you:

         date       location       value tally score
1: 2016-06-30 home work home foo foo bar     3     5

Note that if you wanted to you could probably do it all in one step in the reshaping of the data.table but I found this to be a quicker and easier way for me to achieve the same thing in 2 steps.

Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • 1
    Thank you. I found some helpful information to understand `lapply` and `.SD` [in this separate question](http://stackoverflow.com/questions/32276887/use-of-lapply-sd-in-data-table-r). – basbabybel Jul 01 '16 at 14:26
  • 1
    @basbabybel Happy to help – Hack-R Jul 01 '16 at 15:01