I don't know if this question goes here, but as it's specific and (I think) it has one answer I'm asking it here:
I'm trying to understand certain behavior in ggplot
(through plotnine
from python, which is practically a copy of ggplot
). In specific, I'm trying to understand the behavior of the group
and color
arguments, between the aes()
of ggplot()
and geom_line()
for example.
We have this data:
data = pd.DataFrame({'Period': {0: '2019/07', 1: '2019/07', 2: '2019/07', 3: '2019/08', 4: '2019/09', 5: '2019/09', 6: '2019/10', 7: '2019/10', 8: '2019/11', 9: '2019/11', 10: '2019/12', 11: '2019/12', 12: '2019/12', 13: '2020/01', 14: '2020/01', 15: '2020/01', 16: '2020/02', 17: '2020/02', 18: '2020/02', 19: '2020/03', 20: '2020/03', 21: '2020/03'},
'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'C', 6: 'A', 7: 'C', 8: 'A', 9: 'C', 10: 'A', 11: 'B', 12: 'C', 13: 'A', 14: 'B', 15: 'C', 16: 'A', 17: 'B', 18: 'C', 19: 'A', 20: 'B', 21: 'C'},
'Income': {0: 350.6, 1: 52.4, 2: 33.4, 3: 105.5, 4: 203.4, 5: 114.7, 6: 272.3, 7: 157.4, 8: 288.0, 9: 24.1, 10: 345.5, 11: 27.2, 12: 10.8, 13: 187.8, 14: 111.7, 15: 49.2, 16: 293.1, 17: 77.7, 18: 132.8, 19: 221.8, 20: 27.6, 21: 87.0}})
And when I try to plot it:
(ggplot(data, aes(x="Period", y="Income", color="Category"))
+ geom_line())
It gives this error:
PlotnineWarning: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
Where the error seems to be that each group has one observation
but actually I can see it's because there is no Income
values for some observations given a certain Category
and Period
. And I know there is only one observation per group, as to construct data
I grouped it by Period
and Category
, but I don't know what kind of grouping the error is referring to.
And I solved adding the argument group="Category"
:
(ggplot(data, aes(x="Period", y="Income", color="Category", group = "Category"))
+ geom_line())
But if I change color and group arguments to the geom_line(aes())
like this:
(ggplot(data, aes(x="Period", y="Income"))
+ geom_line(aes(color="Category", group = "Category")))
It will give me the exact same plot. Why? what's the difference calling it at the different aes()
?.
Also, where should I call group = 1
as in this answer? Because when I try to use it with data
I can't understand what ggplot is actually doing:
(ggplot(data, aes(x="Period", y="Income", color="Category", group = 1))
+ geom_line())