-1

I need to create a third column in the dataframe (called teste) below which would contain the mean for the model of vehicle represented, such that on a car row it would calculate the mean for all car models and similar calculation for bikes and trucks.

model   price
car 10
car 11
car 12
car 13
car 14
bike    5
bike    6
bike    7
bike    8
bike    9
truck   12
truck   13
truck   14
truck   15
truck   16

I was able to create a for loop which can print the desired results with the following R code:

    for(x in teste$model){
  print(mean(teste[teste$model==x, ]$price))
}

However, when trying to create the third column or vector the code below is giving me an error stating that the replacement row is bigger than the data.

        teste$media <- rep(NA, 15)
    for(x in teste$model){
        teste$media[x] <- mean(teste[teste$model==x, ]$price)
}

I have no idea why the replacement vector is bigger. Can anyone help me identify the error or propose another way to acomplish the goal.

Thank you all in advance

Alex

alexdamado
  • 25
  • 2

2 Answers2

2

Use ave which uses mean as default function. See ?ave.

> teste$media <- ave(teste$price, teste$model)
> teste
   model price media
1    car    10    12
2    car    11    12
3    car    12    12
4    car    13    12
5    car    14    12
6   bike     5     7
7   bike     6     7
8   bike     7     7
9   bike     8     7
10  bike     9     7
11 truck    12    14
12 truck    13    14
13 truck    14    14
14 truck    15    14
15 truck    16    14
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

With dplyr:

library(dplyr)

teste %>% group_by(model) %>%
  mutate(media=mean(price))

Or with data.table:

library(data.table)

setDT(teste)[ , media:=mean(price), by=model]
eipi10
  • 91,525
  • 24
  • 209
  • 285