0

I try to generate yearwise summary statistics as follows:

data %>%
  group_by(year) %>%
    summarise(mean.abc = mean(abc), mean.def = mean(def), sd.abc = sd(abc), sd.def = sd(def))

This code returns a row vector filled with NA in the respective columns

  mean.abc mean.def sd.abc sd.def
1       NA       NA     NA     NA

So, I tried to work this out and replicated some examples

data(mtcars)

mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp))

And this script returns

      mean
1 230.7219

So, what am I doing wrong? I am loading the following packages:

loadpackage( c("foreign","haven", "tidyverse", "plyr", "stringr", "eeptools", "factoextra") )

Thanky for your support!

Daniel2805
  • 77
  • 1
  • 8
  • Since we can not see your data, it is a bit unclear where your issue is. If I had to guess though, I would say that adding `, na.rm = T` to your summary-functions should do the trick. – Max Teflon Mar 12 '21 at 13:43
  • Since my data is confidential, I am not able to poste the data. But na.rm = TRUE does not do the trick. I think we can work this out relying on the data mtcars. – Daniel2805 Mar 12 '21 at 13:50
  • As mentioned in [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) concerning reproducible examples, you do not have to share your data, it would suffice if you could perhaps give us the output of `dput(head(data))`, you could even change the data by multiplying it with some random noise before sharing it `(mutate(data, across(where(is.numeric),~.*sample(0:10,1))` for example would just do that. – Max Teflon Mar 12 '21 at 13:53
  • So, maybe this rings a bell: I am not able to add white noise using the code snippet above, even if close the parantheses correctly. – Daniel2805 Mar 12 '21 at 14:04
  • Is there any error-message? And does `str(data)` return numeric for your `abc` and `def`-columns? – Max Teflon Mar 12 '21 at 14:06
  • my snippet did not make any sense either, it should read `mutate(data, across(where(is.numeric),~sample(0:10,nrow(data), T)))` – Max Teflon Mar 12 '21 at 14:09
  • All three variables are numeric ( abc: num [1:575118] ....). But I also just added a scalar to the column abc and it worked – Daniel2805 Mar 12 '21 at 14:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229825/discussion-between-max-teflon-and-daniel2805). – Max Teflon Mar 12 '21 at 14:11

1 Answers1

1

Your issue is that the summarise-function from the plyr-package does not do what you expect it to do.

See the difference between:

library(tidyverse)

mtcars %>%
  group_by(cyl) %>%
  plyr::summarise(mean = mean(disp))
#>       mean
#> 1 230.7219

and

mtcars %>%
  group_by(cyl) %>%
  dplyr::summarise(mean = mean(disp))
#> # A tibble: 3 x 2
#>     cyl  mean
#>   <dbl> <dbl>
#> 1     4  105.
#> 2     6  183.
#> 3     8  353.

Since your data seems to have missing values, this should do the trick:

   data %>% 
    group_by(year) %>% 
    dplyr::summarise(across(all_of(c('abc', 'def')),
                            .fns = list(mean = ~mean(.,na.rm=T),
                                        sd = ~sd(.,na.rm=T))))
Max Teflon
  • 1,760
  • 10
  • 16