0

I have a dataset where I have multiple observations in a given year for a variable. I am wanting to calculate an average for each year for that variable. Here's what the data look like: enter image description here

I want to get an average of each year to then plot the average of the VCF0207 variable over time.

I tried using the following code to make a temporary dataframe to get these averages for each year:

temp.df <- df %>%
             group_by(VCF0004) %>%
             summarize(VCF0207 = mean(VCF0207))

That didn't work. It gave me 1 variable with 1 observation. Am I missing something?

EDIT: Here is a bit more about what the data look like:

> dput(head(ts.dat,20))
structure(list(VCF0004 = structure(c(1964, 1964, 1964, 1964, 
1964, 1964, 1964, 1964, 1964, 1964, 1964, 1964, 1964, 1964, 1964, 
1964, 1964, 1964, 1964, 1964), label = "Year of Study", format.stata = "%9.0g"), 
    VCF0207 = structure(c(85, 70, 85, 85, 60, 97, 97, 50, 70, 
    50, 97, 97, 97, 97, 97, 97, 97, 97, 97, 85), label = "Thermometer - Whites", format.stata = "%75.0g", labels = c(`97. 97-100 Degrees` = 97, 
    `98. DK  (exc. 1964-1968: see VCF0201 note)` = 98, `99. NA; no Post IW; form III,IV (1972); breakoff, sufficient partial (2016)` = 99
    ), class = c("haven_labelled", "vctrs_vctr", "double"))), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
NColl
  • 757
  • 5
  • 19
  • Can you post sample data in `dput` format? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. – Rui Barradas Oct 13 '20 at 19:30
  • You should have 1 row per unique value of `VCF0004`. If you don't get that, make sure you aren't falling into [the old "loading `plyr` after `dplyr`" trap](https://stackoverflow.com/q/26106146/903061) - try specifying `dplyr::summarize` explicitly. – Gregor Thomas Oct 13 '20 at 19:31
  • something like: `df %>% group_by(VCF) %>% aggregate(x ~ year, dat, mean)` – Lime Oct 13 '20 at 19:33
  • @GregorThomas So I checked that and the resulting output then has a column for each year but then only NA's in the column where I should've gotten my means. – Damon C. Roberts Oct 13 '20 at 19:38
  • 1
    Are there missing values in `VCF0207`? use `mean(na.rm = TRUE)`. Compare `mean(c(1, 2, NA))` and `mean(c(1, 2, NA), na.rm =TRUE)` – Calum You Oct 13 '20 at 20:06

0 Answers0