-4

I want to aggregate a dataset which includes time date and other variables. Now I met a problem when I want to record the earliest value of one variable during that day.I tried

dt[, .(new_var1 := dt[time==(min(time)), .(var1)), by = .(month,day)]

But it will return me many repeative rows. For one day, there will be many rows.

Jaap
  • 81,064
  • 34
  • 182
  • 193
zack
  • 1
  • Try `dt[order(time), head(.SD, 1L), .(month, day)]` – akrun Jun 17 '16 at 17:09
  • 1
    In case you're wondering about the negative reception of this question, it probably has something to do with the nonsensical code you say you tried (which has brackets that do not match up). It's also preferred that you make a small reproducible example, as covered here: http://stackoverflow.com/a/28481250/ – Frank Jun 17 '16 at 18:22

2 Answers2

2

We can use

library(data.table)
dt[order(time), head(.SD, 1L), .(month, day)]

Update

If we need the max and min values,

dt[dt[order(time), .I[c(1, .N)], .(month, day)]$V1]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Try this (using dplyr)

dt %>%
  group_by(month, date) %>%
  filter(time == min(time))

Or

dt %>%
  group_by(month, date) %>%
  top_n(1, -time)

EDIT: To get the min value for each date:

dt %>%
  group_by(month, date) %>%
  top_n(1, -var1)

For both min and max

dt %>%
  group_by(month, date) %>%
  arrange(month, date) %>%
  filter(row_number() %in% c(1, n())
Felipe Gerard
  • 1,552
  • 13
  • 23