0

I have a large dataset with around 100 columns, One row in my data is a sub-village unit (booth), so a village may appear more than once. The column which is village specifics like "population" and "capital" is the same for booth (rows). But the variable like "population_min" is different so I take the arithmetic mean while collapsing.

df_input <- data.frame( booth= c(1,2,3,4,5,6), village= c("A","B","B","C","D","D"), capital=c("alpha", "beta", "beta", "gamma", "sigma", "sigma"), population = c(1000,1500,1500,2000,1700,1700), population_min = c(100,200,300,400,500,600))    

df_output <- data.frame(village = c("A","B","C","D"), population=c(1000,1500,2000,1700), capital=c("alpha", "beta", "gamma", "sigma"), population_min = c(100, 250, 400, 550))

So basically, I want to collapse the data at the village level while keeping the village-specific variable unchanged and taking the mean for the variable which is different for the sub-village level.

Fuser
  • 47
  • 1
  • 9

0 Answers0