1

I have a data table and I would like to extract a subset of it based on a few conditions - basically, I would like a subset such that each date and time is unique for each group, while also taking the first value that occurs for each time. For example, say I have a data table that looks like this:

Group |           Time      | Value
------------------------------------
a     | 2010-07-13 20:00:00 | 1
a     | 2010-07-13 20:00:00 | 2
a     | 2010-07-13 20:00:01 | 3
a     | 2010-07-13 20:00:02 | 4
a     | 2010-07-14 20:00:00 | 5
a     | 2010-07-14 20:00:01 | 6
b     | 2010-07-13 20:00:00 | 7
b     | 2010-07-13 20:00:00 | 8
b     | 2010-07-13 20:00:01 | 9
b     | 2010-07-14 20:00:00 | 10

If I subset this data table based on the conditions I mentioned above, it should now look like this:

Group |      Time           |  Value
-------------------------------------   
a     | 2010-07-13 20:00:00 | 1
a     | 2010-07-13 20:00:01 | 3
a     | 2010-07-13 20:00:02 | 4
a     | 2010-07-14 20:00:00 | 5
a     | 2010-07-14 20:00:01 | 6
b     | 2010-07-13 20:00:00 | 7
b     | 2010-07-13 20:00:01 | 9
b     | 2010-07-14 20:00:00 | 10

I'm not really too sure how to go about doing this, so if anyone can help me then I would really appreciate it, thanks! I've also attached the code below to generate the first data table

cc <- c("a","a", "a","a", "a","a", "b","b", "b","b") aa <- as.data.table(c(as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:02", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-14 20:00:00", format = "%Y-%m-%d %H:%M:%S") , as.POSIXct("2010-07-14 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:00", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:0", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-13 20:00:01", format = "%Y-%m-%d %H:%M:%S"), as.POSIXct("2010-07-14 20:00:00", format = "%Y-%m-%d %H:%M:%S"))) bb <- c(1,2,3,4,5,6,7,8,9,10) dd <- cbind(cc,aa,bb) rm(aa) colnames(dd) <- c("Group", "Time", "Value")

And here is the dput output:

structure(list(Group = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b"), 
Time = structure(c(1279015200, 1279015200, 1279015201, 1279015202, 1279101600, 1279101601, 1279015200, 1279015200, 1279015201, 
1279101600), class = c("POSIXct", "POSIXt")), Value = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), .Names = c("Group", "Time", "Value"), row.names = c(NA, -10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:0x0000000000100788>)
Psherman
  • 23
  • 2
  • try `dt[, .SD[1], .(Group, Time)]` or `dt[dt[, .I[1], .(Group, Time)]$V1]` – akrun Jun 15 '18 at 02:57
  • Thanks! I just have one more question, say for instance there is a 4th column called "Item" in the data table which I don't want to include in the new subset, is there any way I can specify that I don't want the 4th column and that I just want the "Group", "Time" and "Value" column? – Psherman Jun 15 '18 at 03:06
  • You can either do the subsetting after the step i.e. `dt[, .SD[1], .(Group, Time)][, 1:3:, with = FALSE]` or start with the subset of columns i.e. `dt[, 1:3, with = FALSE][, .SD[1], .(Group, Time)]` – akrun Jun 15 '18 at 03:08

0 Answers0