creating an array of grouped values (means)

Question

I have a large dataset ("bsa", drawn from a 23-year period) which includes a variable ("leftrigh") for "left-right" views (political orientation). I'd like to summarise how the cohorts change over time. For example, in 1994 the average value of this scale for people aged 45 was (say) 2.6; in 1995 the average value of this scale for people aged 46 was (say) 2.7 -- etc etc. I've created a year-of-birth variable ("yrbrn") to facilitate this.

I've successfully created the means:

bsa <- bsa %>% group_by(yrbrn, syear) %>% mutate(meanlr = mean(leftrigh))

Where I'm struggling is to summarise the means by year (of the survey) and age (at the time of the survey). If I could create an array (containing these means) organised by age x survey-year, I could see the change over time by inspecting the diagonals. But I have no clue how to do this -- my skills are very limited...

A tibble: 66,744 x 10
Groups:   yrbrn [104]
     Rsex     Rage  leftrigh OldWt syear     yrbrn   coh   per agecat  meanlr
1 1 [Male]      40  1 [left] 1.12   2017      1977    17  2017 [37,47)   2.61
2 2 [Female]    79  1.8      0.562  2017      1938     9  2017 [77,87)   2.50
3 2 [Female]    50  1.5      1.69   2017      1967    15  2017 [47,57)   2.59
4 1 [Male]      73  2        0.562  2017      1944    10  2017 [67,77)   2.57
5 2 [Female]    31  3        0.562  2017      1986    19  2017 [27,37)   2.56
6 1 [Male]      74  2.2      0.562  2017      1943    10  2017 [67,77)   2.50
7 2 [Female]    58  2        0.562  2017      1959    13  2017 [57,67)   2.56
8 1 [Male]      59  1.2      0.562  2017      1958    13  2017 [57,67)   2.53
9 2 [Female]    19  4        1.69   2017      1998    21  2017 [17,27)   2.46

Possible format for presenting this information to see change over time:

        1994  1995  1996  1997  1998  1999  2000  
18  
19  
20  
21  
22  
23  
24  
25  
etc.

Welcome to StackOverflow! Read how to create a [reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). You can use `dput` to provide us with a sample of your data. — , Jun 16 '19 at 10:59
Data added. Not sure how to indicated desired output -- I'm open-minded as to what format to use, though an array where age gives rows and survey-year gives columns might be useful. — dbartram, Jun 16 '19 at 10:59
yes -- I've changed the code above, to group_by yrbrn and syear. — dbartram, Jun 16 '19 at 11:06
Possible duplicate of [Group by multiple columns in dplyr, using string vector input](https://stackoverflow.com/questions/21208801/group-by-multiple-columns-in-dplyr-using-string-vector-input) — NelsonGon, Jun 16 '19 at 11:10
I've added something re the kind of format I have in mind. If the cells in this array contained the means I've created, I could see the change over time by reading along the diagonals. — dbartram, Jun 16 '19 at 11:14
I guess `group_by` is what you are looking for. Never mind, there is already an answer posted. — Jason Mathews, Jun 16 '19 at 11:14
RE possible duplicate, it seems the main issue is that user is trying to shape the data into a matrix - not an issue of group_by for >1 variable.. — rg255, Jun 16 '19 at 11:22
@rg255 the OP says the matrix is a "possible format", and then in a comment above says they are open to other options. — , Jun 16 '19 at 11:25

rg255 · Accepted Answer · 2019-06-16T11:18:09.617

You can group_by both age and year at the same time:

# Setup (& make reproducible data...)
n <- 10000
df1 <- data.frame(
  'yrbrn' = sample(1920:1995, size = n, replace = T),
  'Syear' = sample(2005:2015, size = n, replace = T),
  'leftrigh' = sample(seq(0,5,0.1), size = n, replace = T))

# Solution
df1 %>% 
  group_by(yrbrn, Syear) %>% 
  summarise(meanLR = mean(leftrigh)) %>% 
  spread(Syear, meanLR)

Produces the following:

# A tibble: 76 x 12
# Groups:   yrbrn [76]
   yrbrn `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015`
   <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1  1920   3.41   1.68   2.26   2.66   3.21   2.59   2.24   2.39   2.41   2.55   3.28
 2  1921   2.43   2.71   2.74   2.32   2.24   1.89   2.85   3.27   2.53   1.82   2.65
 3  1922   2.28   3.02   1.39   2.33   3.25   2.09   2.35   1.83   2.09   2.57   1.95
 4  1923   3.53   3.72   2.87   2.05   2.94   1.99   2.8    2.88   2.62   3.14   2.28
 5  1924   1.77   2.17   2.71   2.18   2.71   2.34   2.29   1.94   2.7    2.1    1.87
 6  1925   1.83   3.01   2.48   2.54   2.74   2.11   2.35   2.65   2.57   1.82   2.39
 7  1926   2.43   3.2    2.53   2.64   2.12   2.71   1.49   2.28   2.4    2.73   2.18
 8  1927   1.33   2.83   2.26   2.82   2.34   2.09   2.3    2.66   3.09   2.2    2.27
 9  1928   2.34   2.02   2.1    2.88   2.14   2.44   2.58   1.67   2.57   3.11   2.93
10  1929   2.31   2.29   2.93   2.08   2.11   2.47   2.39   1.76   3.09   3      2.9

Thanks -- yes, I think that's what I've done. But there are a great many means created by this step. The issue is, how to present them in a way that enables me to see change over time for the various cohorts. — dbartram, Jun 16 '19 at 11:16
this now produces a matrix @dbartram with the cohort/yrbrn in rows so you can see how these change with survey year — rg255, Jun 16 '19 at 11:21
Ah, that looks like what I'm after -- better than what I had in mind, because now I can see the change within the rows (rather than looking at diagonals). Thanks so much! I'll try to do better next time in posing/formatting my question — dbartram, Jun 16 '19 at 11:29

creating an array of grouped values (means)

1 Answers1