Why are aggregate data frame and aggregate formula do not return the same results?

Question

I just started learning R last month and I am learning the aggregate functions.

To start off, I have a data called property and I am trying to get the mean price per city.

I first used the formula method of aggregate:

mean_price_per_city_1 <- aggregate(PRICE ~ PROPERTYCITY,
  property_data, mean)

The results are as follow (just the head):

PROPERTYCITY	PRICE
	1.00
ALLISON PARK	193814.08
AMBRIDGE	62328.92
ASPINWALL	226505.50
BADEN	400657.52
BAIRDFORD	59337.37

Then I decided to try the data frame method:

mean_price_per_city_2 <- aggregate(list(property_data$PRICE),
                             by = list(property_data$PROPERTYCITY),
                             FUN = mean)

The results are as follow (just the head):

Group.1	c.12000L.. 1783L..4643L..
	1.00
ALLISON PARK	NA
AMBRIDGE	62328.92
ASPINWALL	226505.50
BADEN	400657.52
BAIRDFORD	59337.37

I thought that the two methods will return the same results. However I noticed that when I used the data frame method, there are NAs in the second column.

I tried checking if there are NAs in the PRICE column, but there is none. So I am lost why the two methods don't return the same values.

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Nov 04 '21 at 02:30

score 3 · Answer 1 · answered Nov 04 '21 at 02:47

You have two issues. First aggregate(list(property_data$PRICE), by = list(property_data$PROPERTYCITY), FUN = mean) should just have property_data$PRICE without the list. Only the by= argument must be a list. That is why your column name is so strange. Second, as documented in the manual page (?aggregate), the formula method has a default value of na.action=na.omit, but the method for class data.frame does not. Since you have at least one missing value in the ALLISON PARK group, the formula command deleted that value, but the second command did not so the result for ALLISON PARK is NA.

Why are aggregate data frame and aggregate formula do not return the same results?

1 Answers1