I just started learning R last month and I am learning the aggregate functions.
To start off, I have a data called property and I am trying to get the mean price per city.
I first used the formula method of aggregate:
mean_price_per_city_1 <- aggregate(PRICE ~ PROPERTYCITY,
property_data, mean)
The results are as follow (just the head):
PROPERTYCITY | PRICE |
---|---|
1.00 | |
ALLISON PARK | 193814.08 |
AMBRIDGE | 62328.92 |
ASPINWALL | 226505.50 |
BADEN | 400657.52 |
BAIRDFORD | 59337.37 |
Then I decided to try the data frame method:
mean_price_per_city_2 <- aggregate(list(property_data$PRICE),
by = list(property_data$PROPERTYCITY),
FUN = mean)
The results are as follow (just the head):
Group.1 | c.12000L.. 1783L..4643L.. |
---|---|
1.00 | |
ALLISON PARK | NA |
AMBRIDGE | 62328.92 |
ASPINWALL | 226505.50 |
BADEN | 400657.52 |
BAIRDFORD | 59337.37 |
I thought that the two methods will return the same results. However I noticed that when I used the data frame method, there are NAs in the second column.
I tried checking if there are NAs in the PRICE column, but there is none. So I am lost why the two methods don't return the same values.