2

I don't know if this question goes here, but as it's specific and (I think) it has one answer I'm asking it here:

I'm trying to understand certain behavior in ggplot (through plotnine from python, which is practically a copy of ggplot). In specific, I'm trying to understand the behavior of the group and color arguments, between the aes() of ggplot() and geom_line() for example.

We have this data:

data = pd.DataFrame({'Period': {0: '2019/07', 1: '2019/07', 2: '2019/07', 3: '2019/08', 4: '2019/09', 5: '2019/09', 6: '2019/10', 7: '2019/10', 8: '2019/11', 9: '2019/11', 10: '2019/12', 11: '2019/12', 12: '2019/12', 13: '2020/01', 14: '2020/01', 15: '2020/01', 16: '2020/02', 17: '2020/02', 18: '2020/02', 19: '2020/03', 20: '2020/03', 21: '2020/03'},
                     'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'C', 6: 'A', 7: 'C', 8: 'A', 9: 'C', 10: 'A', 11: 'B', 12: 'C', 13: 'A', 14: 'B', 15: 'C', 16: 'A', 17: 'B', 18: 'C', 19: 'A', 20: 'B', 21: 'C'},
                     'Income': {0: 350.6, 1: 52.4, 2: 33.4, 3: 105.5, 4: 203.4, 5: 114.7, 6: 272.3, 7: 157.4, 8: 288.0, 9: 24.1, 10: 345.5, 11: 27.2, 12: 10.8, 13: 187.8, 14: 111.7, 15: 49.2, 16: 293.1, 17: 77.7, 18: 132.8, 19: 221.8, 20: 27.6, 21: 87.0}})

And when I try to plot it:

(ggplot(data, aes(x="Period", y="Income", color="Category"))
 + geom_line())

enter image description here

It gives this error:

PlotnineWarning: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

Where the error seems to be that each group has one observation but actually I can see it's because there is no Income values for some observations given a certain Category and Period. And I know there is only one observation per group, as to construct data I grouped it by Period and Category, but I don't know what kind of grouping the error is referring to.

And I solved adding the argument group="Category":

(ggplot(data, aes(x="Period", y="Income", color="Category", group = "Category"))
 + geom_line())

enter image description here

But if I change color and group arguments to the geom_line(aes()) like this:

(ggplot(data, aes(x="Period", y="Income"))
 + geom_line(aes(color="Category", group = "Category")))

It will give me the exact same plot. Why? what's the difference calling it at the different aes()?.

Also, where should I call group = 1 as in this answer? Because when I try to use it with data I can't understand what ggplot is actually doing:

(ggplot(data, aes(x="Period", y="Income", color="Category", group = 1))
 + geom_line())

enter image description here

Chris
  • 2,019
  • 5
  • 22
  • 67

1 Answers1

3

The group if not explicitly mapped is computed using a combination of all of the other discrete aesthetics. Of these mappings

aes(x="Period", y="Income", color="Category")

Period and Category are discrete and they combine in such a way that none of the resulting groups have more than one point. You can actually view the computed group with

(ggplot(data, aes(x="Period", y="Income", color="Category"))
 + geom_line()
 + geom_label(aes(label='stat(group)'))
)

Plot of assigned group

You need two or more points (in a group) to get a line. If you want all points to belong to the same group then you set the group to a constant, that is what group=1 achieves.

About aes

There are 3 places where you can put aes mappings.

ggplot(data, aes(..)) + geom_point()     # 1
ggplot(data) + aes(...) + geom_point()   # 2
ggplot(data) + geom_point(aes(...))      # 3

1 and 2 give you global mappings i.e all the geoms can see them, while 3 is a local mapping -- it applies only to that geom. In case of a conflict local mapping take precedence.

has2k1
  • 2,095
  • 18
  • 16