1

I have a large dataset ("bsa", drawn from a 23-year period) which includes a variable ("leftrigh") for "left-right" views (political orientation). I'd like to summarise how the cohorts change over time. For example, in 1994 the average value of this scale for people aged 45 was (say) 2.6; in 1995 the average value of this scale for people aged 46 was (say) 2.7 -- etc etc. I've created a year-of-birth variable ("yrbrn") to facilitate this.

I've successfully created the means:

bsa <- bsa %>% group_by(yrbrn, syear) %>% mutate(meanlr = mean(leftrigh))

Where I'm struggling is to summarise the means by year (of the survey) and age (at the time of the survey). If I could create an array (containing these means) organised by age x survey-year, I could see the change over time by inspecting the diagonals. But I have no clue how to do this -- my skills are very limited...

A tibble: 66,744 x 10
Groups:   yrbrn [104]
     Rsex     Rage  leftrigh OldWt syear     yrbrn   coh   per agecat  meanlr
1 1 [Male]      40  1 [left] 1.12   2017      1977    17  2017 [37,47)   2.61
2 2 [Female]    79  1.8      0.562  2017      1938     9  2017 [77,87)   2.50
3 2 [Female]    50  1.5      1.69   2017      1967    15  2017 [47,57)   2.59
4 1 [Male]      73  2        0.562  2017      1944    10  2017 [67,77)   2.57
5 2 [Female]    31  3        0.562  2017      1986    19  2017 [27,37)   2.56
6 1 [Male]      74  2.2      0.562  2017      1943    10  2017 [67,77)   2.50
7 2 [Female]    58  2        0.562  2017      1959    13  2017 [57,67)   2.56
8 1 [Male]      59  1.2      0.562  2017      1958    13  2017 [57,67)   2.53
9 2 [Female]    19  4        1.69   2017      1998    21  2017 [17,27)   2.46

Possible format for presenting this information to see change over time:

        1994  1995  1996  1997  1998  1999  2000  
18  
19  
20  
21  
22  
23  
24  
25  
etc.  
M--
  • 25,431
  • 8
  • 61
  • 93
dbartram
  • 33
  • 5
  • 3
    Can you add some input data and desired output ? – Jason Mathews Jun 16 '19 at 10:48
  • Welcome to StackOverflow! Read how to create a [reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). You can use `dput` to provide us with a sample of your data. –  Jun 16 '19 at 10:59
  • Data added. Not sure how to indicated desired output -- I'm open-minded as to what format to use, though an array where age gives rows and survey-year gives columns might be useful. – dbartram Jun 16 '19 at 10:59
  • Do you mean you want to group_by two things at once? – rg255 Jun 16 '19 at 11:04
  • 1
    yes -- I've changed the code above, to group_by yrbrn and syear. – dbartram Jun 16 '19 at 11:06
  • perhaps `arrange(syear, Rage)`? –  Jun 16 '19 at 11:07
  • Possible duplicate of [Group by multiple columns in dplyr, using string vector input](https://stackoverflow.com/questions/21208801/group-by-multiple-columns-in-dplyr-using-string-vector-input) – NelsonGon Jun 16 '19 at 11:10
  • I've added something re the kind of format I have in mind. If the cells in this array contained the means I've created, I could see the change over time by reading along the diagonals. – dbartram Jun 16 '19 at 11:14
  • I guess `group_by` is what you are looking for. Never mind, there is already an answer posted. – Jason Mathews Jun 16 '19 at 11:14
  • 1
    RE possible duplicate, it seems the main issue is that user is trying to shape the data into a matrix - not an issue of group_by for >1 variable.. – rg255 Jun 16 '19 at 11:22
  • 1
    @rg255 the OP says the matrix is a "possible format", and then in a comment above says they are open to other options. –  Jun 16 '19 at 11:25

1 Answers1

1

You can group_by both age and year at the same time:

# Setup (& make reproducible data...)
n <- 10000
df1 <- data.frame(
  'yrbrn' = sample(1920:1995, size = n, replace = T),
  'Syear' = sample(2005:2015, size = n, replace = T),
  'leftrigh' = sample(seq(0,5,0.1), size = n, replace = T))

# Solution
df1 %>% 
  group_by(yrbrn, Syear) %>% 
  summarise(meanLR = mean(leftrigh)) %>% 
  spread(Syear, meanLR)

Produces the following:

# A tibble: 76 x 12
# Groups:   yrbrn [76]
   yrbrn `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015`
   <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1  1920   3.41   1.68   2.26   2.66   3.21   2.59   2.24   2.39   2.41   2.55   3.28
 2  1921   2.43   2.71   2.74   2.32   2.24   1.89   2.85   3.27   2.53   1.82   2.65
 3  1922   2.28   3.02   1.39   2.33   3.25   2.09   2.35   1.83   2.09   2.57   1.95
 4  1923   3.53   3.72   2.87   2.05   2.94   1.99   2.8    2.88   2.62   3.14   2.28
 5  1924   1.77   2.17   2.71   2.18   2.71   2.34   2.29   1.94   2.7    2.1    1.87
 6  1925   1.83   3.01   2.48   2.54   2.74   2.11   2.35   2.65   2.57   1.82   2.39
 7  1926   2.43   3.2    2.53   2.64   2.12   2.71   1.49   2.28   2.4    2.73   2.18
 8  1927   1.33   2.83   2.26   2.82   2.34   2.09   2.3    2.66   3.09   2.2    2.27
 9  1928   2.34   2.02   2.1    2.88   2.14   2.44   2.58   1.67   2.57   3.11   2.93
10  1929   2.31   2.29   2.93   2.08   2.11   2.47   2.39   1.76   3.09   3      2.9
rg255
  • 4,119
  • 3
  • 22
  • 40
  • Thanks -- yes, I think that's what I've done. But there are a great many means created by this step. The issue is, how to present them in a way that enables me to see change over time for the various cohorts. – dbartram Jun 16 '19 at 11:16
  • this now produces a matrix @dbartram with the cohort/yrbrn in rows so you can see how these change with survey year – rg255 Jun 16 '19 at 11:21
  • Ah, that looks like what I'm after -- better than what I had in mind, because now I can see the change within the rows (rather than looking at diagonals). Thanks so much! I'll try to do better next time in posing/formatting my question – dbartram Jun 16 '19 at 11:29