Accessing group data frame information within a groupby() - .data, cur_group() etc

Question

Thanks in advance wonderful community!

I'd like to know the "Rthonic" way to do the following:

I have hierarchical data. The dependent variable is EucDistance. The measurements are by stride, but each stride measurement is classified by dog, spacing, and stride.
I would like to normalize the data in the following manner:
I would like to find the average EucDistance at a specific spacing level within each dog.
I would then like to divide every measurement within a dog by that average measurement for that specific dog (at the specific spacing.

I have code that works, but I think it probably is ugly, and not very 'dplyr-thonic'.

Any input apprecidated!

Data

> dff
      X Unnamed..0        dog spacing trial stride      dxreal       dyreal EucDistance
1     0          0    Adeline      80     1      1 -0.45589009  0.414136003   0.6159094
2     1          1    Adeline      80     1      2 -0.93391057 -0.179980238   0.9510951
3     2          2    Adeline      80     1      3 -1.01821020 -0.116316204   1.0248324
4     3          3    Adeline      80     1      4 -1.09724180  0.097720545   1.1015847
5     4          4    Adeline      80     1      5 -1.22770320  0.330795638   1.2714877
6     5          5    Adeline      80     1      6 -0.68055384  0.270466614   0.7323290
7     6          6    Adeline      80     1      7 -1.09646677 -0.293996494   1.1351975
8     7          7    Adeline      80     1      8 -0.80943726 -0.067219664   0.8122236
9     8          8    Adeline      80     1      9 -0.93802071 -0.210576178   0.9613663
10    9          9    Adeline      80     1     10 -0.66246537 -0.010406609   0.6625471
11   10         10    Adeline      80     1     11 -0.95442786  0.145040783   0.9653856
12   11         11    Adeline      80     1     12 -0.93239493  0.023382697   0.9326881
13   12         12    Adeline      80     1     13 -0.77037194 -0.055372287   0.7723594
14   13         13    Adeline      80     1     14 -1.02989032 -0.212016413   1.0514871
15   14          0    Adeline     100     8      1 -0.50987665  0.024829281   0.5104808
16   15          1    Adeline     100     8      2 -0.66623042  0.170542959   0.6877121
17   16          2    Adeline     100     8      3 -0.87581367 -0.053335794   0.8774362
18   17          3    Adeline     100     8      4 -0.78444888  0.094921960   0.7901710
19   18          4    Adeline     100     8      5 -0.85617188 -0.076187146   0.8595550
20   19          5    Adeline     100     8      6 -1.08923768 -0.202441160   1.1078904
21   20          6    Adeline     100     8      7 -1.20453972  0.656030263   1.3716019
22   21          7    Adeline     100     8      8 -0.85432411 -0.504055249   0.9919382
23   22          8    Adeline     100     8      9 -0.81412760 -0.010857375   0.8142000
24   23          9    Adeline     100     8     10 -0.85715425 -0.217992655   0.8844401
25   24         10    Adeline     100     8     11 -0.94484594 -0.035477509   0.9455118
26   25         11    Adeline     100     8     12 -0.74691913 -0.142781463   0.7604438
27   26         12    Adeline     100     8     13 -0.89114523  0.160519197   0.9054867
28   27         13    Adeline     100     8     14 -0.95289045  0.138563387   0.9629123
29   28         14    Adeline     100     8     15 -0.77503600 -0.090543686   0.7803070
30   29         15    Adeline     100     8     16 -0.84674075 -0.206332212   0.8715176
31   39          9    Adeline     120     2     10 -0.08859257  0.047135442   0.1003514
32   40         10    Adeline     120     2     11 -0.55963539 -0.025532526   0.5602175
33   41         11    Adeline     120     2     12 -1.01470291 -0.041777262   1.0155626
34   42         12    Adeline     120     2     13 -1.22524200 -0.104103392   1.2296567
35   43         13    Adeline     120     2     14 -1.27957068  0.244388421   1.3026998
36   44         14    Adeline     120     2     15 -1.23241312 -0.073136012   1.2345813
37   45         15    Adeline     120     2     16 -0.76264635 -0.122637271   0.7724438
38   46          0     Bailey      80     4      1  0.54021404 -0.412684919   0.6798088
39   47          1     Bailey      80     4      2  0.92813722  0.084193405   0.9319481
40   48          2     Bailey      80     4      3  0.97782303  0.465218408   1.0828509
41   49          3     Bailey      80     4      4  1.04957875  0.098643025   1.0542040
42   50          4     Bailey      80     4      5  1.25466959 -0.326264279   1.2963966

Code:

test <- dff %>%
  group_by(dog,spacing) %>%
  mutate(grpMean=mean(EucDistance))

test2 <- test %>%
  group_by(dog) %>%
  mutate(normED=EucDistance/.data$grpMean[.data$spacing==100][1])

This code seems unnecessarily long (two steps) and klugy ([1]) to access first element.

Any input appreciated!

It works! Just want to know a better, elegant way.

score 0 · Answer 1 · answered Mar 03 '23 at 16:51

Your requirements seem to be complex enough that two steps might be unavoidable. Or if you want to compress it into one step, you'll likely have some code that is harder to read. So, focusing on your 2nd comment on your code being "klugy", what if you just make it a little more readable.

You can start with using a function for calculating the group means where spacing is 100:

group_means <- function(df) {
  df %>%
    filter(spacing == 100) %>%
    group_by(dog) %>%
    summarize(grpMean = mean(EucDistance), .groups = "drop")
}

Then use a left_join with your new function to get something clean:

test2 <- dff %>%
  left_join(group_means(dff), by = "dog") %>%
  mutate(normED = EucDistance / grpMean)

score 0 · Answer 2 · answered Mar 04 '23 at 17:22

Hey @HarrisonJones thank you so much -- great suggestion!

I realized I was being a dummy and mutate can do it all in one line just fine - I got hung up on trying to use square brackets indexing, where with Tibbles you need to use $ indexing because they return tibbles even for one column - thus mean gives an error if you do mean(stuf[:,'fubar']) and that leads to one column. This is by design. So this one operation works great:

test2 <- test %>%
  group_by(dog) %>%
  mutate(normED=EucDistance/mean(.data$EucDistance[.data$spacing==100]))

Actually I think you could use the square brackets for the tibble just needs the drop=True. so like mean(data[rows,'column',drop=True]) would work because drop makes the tibble single column become a vector. Info here:

Column referencing after dplyr doesn't work

Thank you so much for following up!

Accessing group data frame information within a groupby() - .data, cur_group() etc

2 Answers2