Calculate mean for reocurring observations of one column but with differing values of other column

Question

I have this data frame of about 35'000 observations. The problem is that there are about 5'000 occurences (as exemplified by the first two and last two rows of the image) whereby I have two observations relating to the same COD_DOM but with differing values of RENDIMENTO. What I would like is to calculate the average RENDIMENTO for all COD_DOM which appear twice and thus keep only one observation with the average value.

Something like `library(dplyr)`, `data %>% group_by(COD_DOM) %>% summarise(RENDIMENTO = mean(RENDIMENTO))`? — iago, Aug 10 '21 at 10:02

score 1 · Accepted Answer · answered Aug 10 '21 at 10:06

1

If your data.frame is just these two columns, you should be able to use:

library(dplyr)

new_df <- data.frame %>%
     group_by(COD_DOM) %>%
     summarize(RENDIMENTO=mean(RENDIMENTO))

answered Aug 10 '21 at 10:06

jmalston

89
4

Calculate mean for reocurring observations of one column but with differing values of other column

1 Answers1