2

I have a data.frame like this (example):

product   protein   fat   starch
  aaa        40      5      10
  bbb        50      6      8
  ccc        12      50     4

and I want to ask for a summary of this values (min,max,1stQ, 3rdQ..). When I run:

aggregate(protein~product,summary,data=DATA4, na.rm = TRUE)

I have this...

    product protein.Min. protein.1st Qu. protein.Median protein.Mean protein.3rd Qu. protein.Max.
      aaa        6.400          14.700         15.600       15.540          16.600       22.500
      bbb        6.300           9.400         10.100       10.130          10.800       15.100
      ccc       23.000          24.080         24.250       24.180          24.420       25.000

However I also wanted to have the frequency and SD. How can I ask that? I tried with ddply but i cannot make it works. (I have NA's in some variables(protein, fat, starch...)

Besides this, and because here i'm only asking a summary for protein levels, how can I ask a summary for every variables that I have (protein, fat, starch, etc...) all in once?

Thank you very much!

Ana Raquel
  • 155
  • 3
  • 13
  • 1
    something like `aggregate(protein~product,FUN = function(i) c(summary(i, na.rm = TRUE), l1 = length(i), sd1 = sd(i, na.rm = TRUE)), data=DATA4)`? – Sotos Feb 02 '17 at 14:47
  • @Sotos Thank you! It's working besides i cannot understand anything in the code.. x) But do know why now the answer is in this type of format? aaa 6.400000e+00 1.470000e+01 1.560000e+01 1.554000e+01 1.660000e+01 (in exponential?) – Ana Raquel Feb 02 '17 at 14:59
  • You can avoid scientific notation by using `options(scipen=999)` – Sotos Feb 02 '17 at 15:04
  • @Sotos But where should I put that? In which part of the code? Sorry.. :/ – Ana Raquel Feb 02 '17 at 15:07
  • just `options(scipen = 999) ; aggregate(.......))))` – Sotos Feb 02 '17 at 15:13
  • @Sotos Thank you!! This solved my problem! – Ana Raquel Feb 02 '17 at 15:16
  • Sure. You might also want to review the dupe target for further info – Sotos Feb 02 '17 at 15:17

1 Answers1

2

If I want to specify how I get the output of a summary I usually turn to a more elaborate solution using dplyr like so:

library(dplyr)

df <- data.frame(product = rep(letters[1:3], each=3,3),
                 protein = sample(10:40, 27, replace=T))

df %>% group_by(product) %>% 
  summarise(min = min(protein)
            ,max = max(protein)
            ,mean = mean(protein)
            ,sd = sd(protein)
            ,n = n()
            ,q25 = quantile(protein, .25)
            ,q75 = quantile(protein, .75))  

result:

# A tibble: 3 × 8
  product   min   max     mean       sd     n   q25   q75
   <fctr> <int> <int>    <dbl>    <dbl> <int> <dbl> <dbl>
1       a    16    39 24.66667 8.717798     9    17    30
2       b    24    40 31.55556 5.387743     9    26    35
3       c    13    38 26.66667 8.108637     9    22    31
Wietze314
  • 5,942
  • 2
  • 21
  • 40
  • 1
    and you could adapt this to `summarise_all(funs(min, max, median, sd, n = n(), q25 = quantile(., .25), q75 = quantile(., .75)))` is you wanted to apply it to all of the categories (protein, fat and starch) in the original data frame – Nate Feb 02 '17 at 14:54
  • @Wietze314 Thank you for your help! In your case i can see that is working but when i run that code i get this error: Error in function_list[[i]](value) : could not find function "group_by" – Ana Raquel Feb 02 '17 at 14:55
  • you want to call `library(dplyr)` those functions are from that package – Nate Feb 02 '17 at 14:58
  • @Wietze314 I know and i did that already... and when i try again...same error. – Ana Raquel Feb 02 '17 at 15:03
  • Are your packages up to date? Maybe you need to update – Sotos Feb 02 '17 at 15:06
  • I added the `library(dplyr)`. But I do not understand what is going wrong, did you change `df` to `DATA4`? since your data seems to be stored in a different variable. – Wietze314 Feb 02 '17 at 15:19