I have a data table and I would like to extract a subset of it based on a few conditions - basically, I would like a subset such that each date and time is unique for each group, while also taking the first value that occurs for each time. For example, say I have a data table that looks like this:
Group | Time | Value
------------------------------------
a | 2010-07-13 20:00:00 | 1
a | 2010-07-13 20:00:00 | 2
a | 2010-07-13 20:00:01 | 3
a | 2010-07-13 20:00:02 | 4
a | 2010-07-14 20:00:00 | 5
a | 2010-07-14 20:00:01 | 6
b | 2010-07-13 20:00:00 | 7
b | 2010-07-13 20:00:00 | 8
b | 2010-07-13 20:00:01 | 9
b | 2010-07-14 20:00:00 | 10
If I subset this data table based on the conditions I mentioned above, it should now look like this:
Group | Time | Value
-------------------------------------
a | 2010-07-13 20:00:00 | 1
a | 2010-07-13 20:00:01 | 3
a | 2010-07-13 20:00:02 | 4
a | 2010-07-14 20:00:00 | 5
a | 2010-07-14 20:00:01 | 6
b | 2010-07-13 20:00:00 | 7
b | 2010-07-13 20:00:01 | 9
b | 2010-07-14 20:00:00 | 10
I'm not really too sure how to go about doing this, so if anyone can help me then I would really appreciate it, thanks! I've also attached the code below to generate the first data table
cc <- c("a","a", "a","a", "a","a", "b","b", "b","b")
aa <- as.data.table(c(as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:02", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-14 20:00:00", format = "%Y-%m-%d %H:%M:%S") , as.POSIXct("2010-07-14 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:0", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-14 20:00:00", format = "%Y-%m-%d %H:%M:%S")))
bb <- c(1,2,3,4,5,6,7,8,9,10)
dd <- cbind(cc,aa,bb)
rm(aa)
colnames(dd) <- c("Group", "Time", "Value")
And here is the dput
output:
structure(list(Group = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b"),
Time = structure(c(1279015200, 1279015200, 1279015201, 1279015202, 1279101600, 1279101601, 1279015200, 1279015200, 1279015201,
1279101600), class = c("POSIXct", "POSIXt")), Value = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), .Names = c("Group", "Time", "Value"), row.names = c(NA, -10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:0x0000000000100788>)