0

I have two columns on the data set and I know I have to use the functions ddply and summarise but I do not know how to start.

  • 2
    Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically after `set.seed(1)`), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans May 23 '20 at 15:33

1 Answers1

1

Hopefully this will get you started:

data %>%
  group_by(Satisfaction) %>%
  summarise(Mean = mean(Salary),
            SD = sd(Salary))
# A tibble: 7 x 3
  Satisfaction    Mean     SD
         <int>   <dbl>  <dbl>
1            1  12481.  1437.
2            2  31965.  5235.
3            3  45844.  7631.
4            4  69052.  9257.
5            5  79555. 12975.
6            6 100557. 13739.
7            7 111414. 19139.

First, you should use the group_by verb to group the data by the variable you are interested in. Then, as you alluded to, you can use the summarise verb to perform a function on the data for the groups. You can do multiple at once by separating the new columns you want to make with ,.

Recall that the %>% pipe operator directs the output of one function to the next as the first argument.

Example data:

set.seed(3)
data <- data.frame(Salary = sapply(rep(1:7,each = 10), function(x){floor(runif(1,x*10000,x*20000))}),
                   Satisfaction = rep(1:7,each = 10))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • ' employee %>% + group_by(JobSatisfaction) %>% + summarise(Mean = mean(MonthlyIncome), + SD = sd(MonthlyIncome)) # A tibble: 4 x 3 JobSatisfaction Mean SD 1 Low 6562. 4645. 2 Medium 6527. 4867. 3 High 6480. 4798. 4 Very High 6473. 4574.' – MikeMilles May 23 '20 at 16:01
  • There is an easy way to post the result? – MikeMilles May 23 '20 at 16:02
  • I would recommend editing your original question or opening a new question to post output. You can surround the output with three backticks (```) to improve formatting. – Ian Campbell May 23 '20 at 16:06
  • Ok, will do, it seems I have to wait 90 minutes. Thanks again! – MikeMilles May 23 '20 at 16:16