All I want to do is a simple average if (just like the command average if in excel). I am working with data.tables for efficiency as I have rather large tables (~1m rows).
My aim is to look up the
Table 1
| individual id | date |
-------------------------------
| 1 | 2018-01-02 |
| 1 | 2018-01-03 |
| 2 | 2018-01-02 |
| 2 | 2018-01-03 |
Table 2
| individual id | date2 | alpha |
---------------------------------------
| 1 | 2018-01-02 | 1 |
| 1 | 2018-01-04 | 1.5 |
| 1 | 2018-01-05 | 1 |
| 2 | 2018-01-01 | 2 |
| 2 | 2018-01-02 | 1 |
| 2 | 2018-01-05 | 4 |
Target result
Updated table 1
| individual id | date | mean(alpha) |
---------------------------------------------
| 1 | 2018-01-02 | 1 |
| 1 | 2018-01-03 | 1 |
| 2 | 2018-01-02 | 1.5 |
| 2 | 2018-01-03 | 1.5 |
This is simply the mean of all the values for this individual in table2, that occurred (date2) prior to (and including) the date. The result can be produced by the following mysql command, but I am unable to reproduce it in R.
update table1
set daily_alpha_avg =
(select avg(case when date2<date then alpha else 0 end)
from table2
where table2.individual_id= table1.individual_id
group by individual_id);
My best guess so far is:
table1[table2, on = .(individual_id, date>=date2),
.(x.individual_id, x.date, bb = mean(alpha)), by= .(x.date, x.individual_id)]
or
table1[, daily_alpha_avg := table2[table1, mean(alpha), on =.(individual_id, date>=date2)]]
but this isnt working, I know its wrong I just dont know how to fix it.
Thanks for any help