0

my df looks like this:

total_sales    profit    day
1200            300      Sun
800             100      Mon
400             105      Fri
1900            600      Wed
900             300      Tue
1100            450      Thu
2300            200      Thu
550             175      Sat

.......

I want to find the average profts per day

What did I do?

  • I have done the following using Data Table:

    we know that data table is:
    DT[select row, what to do , group by]
    
    DT[df$profit, mean, df$day]
    

However, it does not work. Please advice, I need to do using Data.Table

floss
  • 2,603
  • 2
  • 20
  • 37
  • 2
    df[, mean(profit), day] – JDG Oct 05 '20 at 18:19
  • Are you saying that the values below are not the averages you expect? If so, then please provide what you think (at least one of) the average values *should be*. – r2evans Oct 05 '20 at 18:22
  • Your `DT[df$profit,mean,df$day]` is nothing like `data.table`'s normal (NSE) nomenclature. First, using `DT[i,j,by]` nomenclature, an `i=` of `df$profit` is effectively filtering on truthy-like `profit`, which turns into any non-zero profit ... not what you mean. Your `j=mean` is returning a function, not a summary of values. Your `df$day` should be `by=day`, not `df$` ... though it works here, it's generally bad practice to reference `DT$` inside the brackets unless you must use it (for various reasons ... extraneous, possibly "wrong" here). – r2evans Oct 05 '20 at 18:29
  • The `j=` argument is list-like values/function-calls/similar, not a `FUN=` that you see in other functions (e.g., `lapply`, `tapply`, `by`). – r2evans Oct 05 '20 at 18:30

1 Answers1

1
DT[, .(mu = mean(profit)), by = .(day)]
#    day  mu
# 1: Sun 300
# 2: Mon 100
# 3: Fri 105
# 4: Wed 600
# 5: Tue 300
# 6: Thu 325
# 7: Sat 175

Data:

DT <- fread(text = "
total_sales    profit    day
1200            300      Sun
800             100      Mon
400             105      Fri
1900            600      Wed
900             300      Tue
1100            450      Thu
2300            200      Thu
550             175      Sat")

This is a dupe, but I'll step through why your DT[df$profit,mean,df$day] is a little flawed.

  • data.table (and dplyr) generally (but not necessarily always) use non-standard evaluation for variable reference, so day not "day", etc;
  • you use both DT and df, I'm not certain if you have two variables and are mistakenly using one with the other; I'll use just DT here;
  • there are times when you may want to reference DT$ inside the brackets, but these are certainly the exception ... just go with variable names (as symbols);
  • i= is looking for something truthy-like, as in a logical vector (whether to select each row) or an integer vector (which rows to select), and since your profit is an integer column, it will be looking for row 300, row 100, row 105, etc ... which obviously don't exist
  • j= should be an assignment-like operation, so something like newval := somefunc(...) or a summary operation, like .(newval = somefunc(...)); not a FUN=, as in tapply, lapply, Map, by, etc
  • your by= call is generally fine unlabeled, once you remove the df$, then day by itself is fine. I personally tend to be explicit with by= (so by=day here), as much to be declarative and easier to understand my intention as anything else (other named arguments here might include .SDcols= and on=, for various purposes).
r2evans
  • 141,215
  • 6
  • 77
  • 149