Aggregating using datatable or dplyr in R

Question

I want to aggregate a dataset which includes time date and other variables. Now I met a problem when I want to record the earliest value of one variable during that day.I tried

dt[, .(new_var1 := dt[time==(min(time)), .(var1)), by = .(month,day)]

But it will return me many repeative rows. For one day, there will be many rows.

In case you're wondering about the negative reception of this question, it probably has something to do with the nonsensical code you say you tried (which has brackets that do not match up). It's also preferred that you make a small reproducible example, as covered here: http://stackoverflow.com/a/28481250/ — Frank, Jun 17 '16 at 18:22

akrun · Answer 1 · 2016-06-17T17:16:32.330

2

We can use

library(data.table)
dt[order(time), head(.SD, 1L), .(month, day)]

Update

If we need the max and min values,

dt[dt[order(time), .I[c(1, .N)], .(month, day)]$V1]

edited Jun 17 '16 at 17:16

answered Jun 17 '16 at 17:10

akrun

874,273
37
540
662

Thanks! Is it possible to get the max values and ealiest value at the same time. – zack Jun 17 '16 at 17:14

Felipe Gerard · Answer 2 · 2016-06-17T17:16:13.527

0

Try this (using dplyr)

dt %>%
  group_by(month, date) %>%
  filter(time == min(time))

Or

dt %>%
  group_by(month, date) %>%
  top_n(1, -time)

EDIT: To get the min value for each date:

dt %>%
  group_by(month, date) %>%
  top_n(1, -var1)

For both min and max

dt %>%
  group_by(month, date) %>%
  arrange(month, date) %>%
  filter(row_number() %in% c(1, n())

edited Jun 17 '16 at 17:16

answered Jun 17 '16 at 17:10

Felipe Gerard

1,552
13
23

Thanks! But I also want to get the max or min values not only the earliest. – zack Jun 17 '16 at 17:12
I edited the answer. You can adjust the `wt` input on `top_n` or use the filter option: `filter(var1 == min(var1))` – Felipe Gerard Jun 17 '16 at 17:17

Aggregating using datatable or dplyr in R

2 Answers2

Update