Calculate mean of column data based on conditions in another column

Question

I have a dataframe with a transnational data structure that looks something similar to the below:

ID   RANK   GRADE
123  E1     0
123  E1     42
123  E1     NA
123  E2     41
123  E2     42
456  E2     41
456  E2     41
456  E3     NA

I want to calculate the mean of the Grade column for each Rank based on the ID, ignoring the values that are 0 because they are data entry errors, and ignoring the NA's.

For example: For ID 123, I want the mean of Grade when their rank was E1, then for E2, etc.

score 1 · Accepted Answer · answered Feb 12 '18 at 18:44

You can use group_by and summarize from the dplyr package:

library(dplyr)

df %>%
  filter(!is.na(GRADE),
         GRADE != 0) %>%
  group_by(ID, RANK) %>%
  summarize(mean_grade = mean(GRADE))

The filter function is to remove any rows where GRADE is NA or 0

Calculate mean of column data based on conditions in another column

1 Answers1

Linked