0

I have a dataframe with a transnational data structure that looks something similar to the below:

ID   RANK   GRADE
123  E1     0
123  E1     42
123  E1     NA
123  E2     41
123  E2     42
456  E2     41
456  E2     41
456  E3     NA

I want to calculate the mean of the Grade column for each Rank based on the ID, ignoring the values that are 0 because they are data entry errors, and ignoring the NA's.

For example: For ID 123, I want the mean of Grade when their rank was E1, then for E2, etc.

pogibas
  • 27,303
  • 19
  • 84
  • 117
ckdf14
  • 61
  • 1
  • 9

1 Answers1

1

You can use group_by and summarize from the dplyr package:

library(dplyr)

df %>%
  filter(!is.na(GRADE),
         GRADE != 0) %>%
  group_by(ID, RANK) %>%
  summarize(mean_grade = mean(GRADE))

The filter function is to remove any rows where GRADE is NA or 0

tbradley
  • 2,210
  • 11
  • 20