I ran into something today when using .
and %>%
which I don't quite understand. Now I am not sure if I understand either operator.
Data
set.seed(1)
df <- setDT(data.frame(id = sample(1:5, 10, replace = T), value = runif(10)))
Why are are these three equivelant
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% sum()
[1] 3.529399
df[, .(Mean = mean(value)), by = .(id)] %>% {sum(.$Mean)}
[1] 3.529399
sum(df[, .(Mean = mean(value)), by = .(id)]$Mean)
[1] 3.529399
But this answer so different?
df[, .(Mean = mean(value)), by = .(id)] %>% sum(.$Mean)
[1] 22.0588
Could someone explain to me how the pipe operator actually works w.r.t to .
usage. I used to think along the lines of Go fetch what sits on the left of the %>%
.
Investigation that left me more confused
I tried replacing the sum
with print
to see what was actually going on
# As Expected
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% print()
[1] 0.5111589 0.7698414 0.7475319 0.9919061 0.5089610
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 3.529399
# Surprised
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean)
id Mean
1: 1 0.5111589
---
5: 3 0.5089610
# Same
df[, .(Mean = mean(value)), by = .(id)] %>% sum(print(.$Mean))
[1] 22.0588
# Utterly Confused
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 18.5294 #Not even the same as above??
Edit: Looks like nothing to do with data.table or how it was grouped, same issue with data.frame:
x <- data.frame(x1 = 1:3, x2 = 4:6)
sum(x$x1)
# [1] 6
sum(x$x2)
# [1] 15
x %>% .$x1 %>% sum
# [1] 6
x %>% .$x2 %>% sum
# [1] 15
# Why?
x %>% sum(.$x1)
# [1] 27
x %>% sum(.$x2)
# [1] 36