-2

How can we calculate the following from the example below using dplyr or other useful libraries:

  1. total number of schools by each state,
  2. total number of students by each school,
  3. total number of students by each school by Gender,
  4. total number of students by each school by Gender and type,
  5. mean of item1 and item3 by Gender,
  6. mean of item1 and item3 by Gender for each state,
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
Gender = rep(c("male", "female"),times = c(18,32))
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)

df

Thanks so much in advance.

rg4s
  • 811
  • 5
  • 22
Identicon
  • 129
  • 9
  • 2
    https://stackoverflow.com/questions/25293045/count-number-of-rows-in-a-data-frame-in-r-based-on-group and https://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame should be helpful. – Ronak Shah Feb 09 '21 at 10:34
  • 2
    to not get negative points on these questions I would show some kind of attempt. You tagged "dplyr" suggesting you know where to look. So it can come across as trying to get someone else to answer your exercise. – Claudio Paladini Feb 09 '21 at 11:05

2 Answers2

2

TL/DR: You should do some reserach before asking.

You know, here on Stackoverflow we try to help each other. And before requesting for assistance we usually make a thorough reserach. In your case, you could read dplyr or tidyverse documentations. I understand that it is sometimes very boring but it is better than just get an answer from a random Stackoverflow user.

Study group_by and summarise functions by requesting them from the RStudio console (e.g., ?group_by)

 # 1 total school numbers by state

schools_by_state <- df %>%
  group_by(states) %>%
  summarise(number = n())

In your sample dataset you have unique school names. That is why the result mighht be confusing and meaningless.

# 2 total number of students

students <- df %>%
  group_by(schools) %>%
  summarise(students= n())

# by gender

students_gender <- df %>%
  group_by(Gender) %>%
  summarise(stud_gend = n())

# by gender and type

stud_gend_type <- df %>%
  group_by(Gender, type) %>%
  summarise(studs = n())

As you can see, the principle is very simple. So, I left you last two tasks on your own.

rg4s
  • 811
  • 5
  • 22
2

In the code below I've performed part of the essence of the tasks. The main functions you are after from the dplyr library are group_by() and summarise() :

library(dplyr)

ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
Gender = rep(c("male", "female"),times = c(18,32))
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)

head(df)

# total number of schools by each state,

df %>% 
  group_by(states) %>% 
  summarise(number = n())

# total number of students by each school,
# total number of students by each school by Gender,
# total number of students by each school by Gender and type,

df %>% 
  group_by(schools,Gender,type) %>% 
  summarise(number = n())

# mean of item1 and item3 by Gender,
# mean of item1 and item3 by Gender for each state,

df %>% 
  group_by(Gender,states) %>% 
  summarise(item1 = mean(item1),
            item2 = mean(item2))
Claudio Paladini
  • 1,000
  • 1
  • 10
  • 20