0

I was just using the aggregate function (see short example below). But how does the aggregate function knows which of my "randomnumb" belongs to which country? Is my list somehow still storing the countries? Or is it just a matter of order?


df <- data.frame(country = c("Canada","Canada","Canada","US","US"),
state = c("state1", "state2", "state3", "state4", "state5"),
randomnumb = c(1:5)
                )

list <- list(df$randomnumb)

dfaggregate <- aggregate(list,
  by = list(country = df$country),
  FUN = mean)

titeuf
  • 133
  • 1
  • 10
  • 1
    You should better use formula notation `aggregate(randomnumb~country,data=df,sum)` as sometimes lists can be confusing! – Duck Aug 20 '20 at 15:32
  • @Duck, but be very careful as running `mean` have different defaults between list and formula style `aggregate` re missing values: [aggregate methods treat missing values (NA) differently](https://stackoverflow.com/questions/16844613/aggregate-methods-treat-missing-values-na-differently). – Parfait Aug 20 '20 at 19:06
  • @Parfait Nice info! Thanks! – Duck Aug 20 '20 at 19:08

1 Answers1

1

It is just a matter of order. Let's first compute the result of your above data:

aggregate(list,
          by = list(country = df$country),
          FUN = mean)
  country X1.5
1  Canada  2.0
2      US  4.5

Now let's reverse the order of the countries:

aggregate(list,
          by = list(country = rev(df$country)),
          FUN = mean)
  country X1.5
1  Canada  4.0
2      US  1.5

As you can see, the result is different; it's what you would have expected with this data.frame:

data.frame(country = c("US", "US", "Canada","Canada","Canada"),
           state = c("state1", "state2", "state3", "state4", "state5"),
           randomnumb = c(1:5))

So it depends on the order. As Duck said, try to use the formula notation to be clear:

aggregate(randomnumb~country, data = df, mean)
  country randomnumb
1  Canada        2.0
2      US        4.5
starja
  • 9,887
  • 1
  • 13
  • 28