DT[, .(mu = mean(profit)), by = .(day)]
# day mu
# 1: Sun 300
# 2: Mon 100
# 3: Fri 105
# 4: Wed 600
# 5: Tue 300
# 6: Thu 325
# 7: Sat 175
Data:
DT <- fread(text = "
total_sales profit day
1200 300 Sun
800 100 Mon
400 105 Fri
1900 600 Wed
900 300 Tue
1100 450 Thu
2300 200 Thu
550 175 Sat")
This is a dupe, but I'll step through why your DT[df$profit,mean,df$day]
is a little flawed.
data.table
(and dplyr
) generally (but not necessarily always) use non-standard evaluation for variable reference, so day
not "day"
, etc;
- you use both
DT
and df
, I'm not certain if you have two variables and are mistakenly using one with the other; I'll use just DT
here;
- there are times when you may want to reference
DT$
inside the brackets, but these are certainly the exception ... just go with variable names (as symbols);
i=
is looking for something truthy-like, as in a logical vector (whether to select each row) or an integer vector (which rows to select), and since your profit
is an integer column, it will be looking for row 300, row 100, row 105, etc ... which obviously don't exist
j=
should be an assignment-like operation, so something like newval := somefunc(...)
or a summary operation, like .(newval = somefunc(...))
; not a FUN=
, as in tapply
, lapply
, Map
, by
, etc
- your
by=
call is generally fine unlabeled, once you remove the df$
, then day
by itself is fine. I personally tend to be explicit with by=
(so by=day
here), as much to be declarative and easier to understand my intention as anything else (other named arguments here might include .SDcols=
and on=
, for various purposes).