0

I have to do a calculation of quartiles, but I need to group the data by two different conditions. First of all I need to group them by which position they hold in the company, but then I also need to group them by which union they are a part of. Here is an example of my data:

Position    Union   Salary
Consultant  A   1000
Receptionist    B   700
Consultant  A   1250
Consultant  A   1200
HR  A   1100
HR  B   800
Receptionist    B   750
Student B   200
HR  B   700
Consultant  A   900
Student B   300
HR  A   1500
Consultant  A   1300
Consultant  B   800
Consultant  A   1300
Receptionist    B   780
Student B   250
Consultant  B   950
HR  A   1150
Consultant  A   1275

I've tried a number of different ways to go about this, including some very early tests with ddply, but I've also tried out summarise for instance:

library(dplyr)
x %>%
  group_by(Union, Position) %>%
  summarise(Salary= quantile(Salary))

Can anyone help me with this?

EDIT: several users helped me out with a great solution in the comments, thank you very much. I have an additional question though:

I also need to calculate the mean of the salaries by the same conditions.

I tried using the code akrun provided (which worked very well on the quartiles), but when I did this for the mean (x <- x %>% group_by(Union, Position) %>% do(data.frame(., as.list(mean(.$Salary))))) it delivered a variable for each group, instead of merging into one variable and adding it at the end.

Can anyone tell me some way to fix this?

Nordsted
  • 97
  • 2
  • 12
  • 1
    The only issue with that `summarise` solution is that you can do one quartile at a time. You could try `do`. – Axeman Oct 30 '17 at 13:00
  • Where is the `loen` column? If it is `Salary`, then try `x %>% group_by(Union, Position) %>% do(data.frame(., as.list(quantile(.$Salary))))` – akrun Oct 30 '17 at 13:00
  • dplyr's `summarise` expects a single value per group as return. Hence, you can do it using a list-column, like `x <- df %>% group_by(Position, Union) %>% summarise(q = list(quantile(Salary)))` and then either `x$q` or `library(tidyr); unnest(x)` – talat Oct 30 '17 at 13:04
  • @akrun: could you please remove the [duplicate]. I don't see how you can answer this question from the question you've linked to. – Nordsted Oct 30 '17 at 15:03
  • @Nordsted What is the expected output? The comments already showed couple of ways and the duplicate also shows the same. Also, in your edit, you are using `mean` which is different than your original question about `quantile` For `mean` you can use `x %>% group_by(Union, Position) %>% summarise(Salary = mean(Salary))` – akrun Oct 30 '17 at 15:04
  • @akrun They sure did, but I figured that instead of asking a new question, I would simply expand this question (it does still revolve around the group_by command and is within dplyr). Not sure if that was the right move though, I'm a little new to Stack Overflow as a user. Thank you very much for your answers. – Nordsted Oct 30 '17 at 15:12
  • @Nordsted I can reopen this post. But somebody else would dupe tag it later based on the similarity of the post. Also, there are lots of dupes for group by mean. So, I am not sure if reopening makes sense or not – akrun Oct 30 '17 at 15:14
  • 1
    @akrun Don't bother, I think I just need to learn how to better search for answers. Thank you very much for your assistance – Nordsted Oct 30 '17 at 15:19

0 Answers0