9

I have a DataFrame (data) with a simple integer index and 5 columns. The columns are Date, Country, AgeGroup, Gender, Stat. (Names changed to protect the innocent.) I would like to produce a FacetGrid where the Country defines the row, AgeGroup defines the column, and Gender defines the hue. For each of those particulars, I would like to produce a time series graph. I.e. I should get an array of graphs each of which has 2 time series on it (1 male, 1 female). I can get very close with:

g = sns.FacetGrid(data, row='Country', col='AgeGroup', hue='Gender')
g.map(plt.plot, 'Stat')

However this just gives me the sample number on the x-axis rather than the dates. Is there a quick fix in this context.

More generally, I understand that the approach with FacetGrid is to make the grid and then map a plotting function to it. If I wanted to roll my own plotting function, what are the conventions it needs to follow? In particular, how can I write my own plotting function (to pass to map for FacetGrid) that accepts multiple columns worth of data from my dataset?

8one6
  • 13,078
  • 12
  • 62
  • 84
  • When you say "this just gives me the sample number on the x-axis rather than the dates", it's not clear where the dates should be coming from. Is this a different column in your dataframe? – mwaskom Sep 06 '14 at 16:33
  • Yes, see above, there is a column called `Date` and I'd like to use it to generate meaningful x-axis ticks. – 8one6 Sep 06 '14 at 16:53

1 Answers1

10

I'll answer your more general question first. The rules for functions that you can pass to FacetGrid.map are:

  • They must take array-like inputs as positional arguments, with the first argument corresponding to the x axis and the second argument corresponding to the y axis (though, more on the second condition shortly
  • They must also accept two keyword arguments: color, and label. If you want to use a hue variable than these should get passed to the underlying plotting function, though you can just catch **kwargs and not do anything with them if it's not relevant to the specific plot you're making.
  • When called, they must draw a plot on the "currently active" matplotlib Axes.

There may be cases where your function draws a plot that looks correct without taking x, y, positional inputs. I think that's basically what's going on here with the way you're using plt.plot. It can be easier then to just call, e.g., g.set_axis_labels("Date", "Stat") after you use map, which will rename your axes properly. You may also want to do g.set(xticklabels=dates) to get more meaningful ticks.

There is also a more general function, FacetGrid.map_dataframe. The rules here are similar, but the function you pass must accept a dataframe input in a parameter called data, and instead of taking array-like positional inputs it takes strings that correspond to variables in that dataframe. On each iteration through the facets, the function will be called with the input dataframe masked to just the values for that combination of row, col, and hue levels.

So in your specific case, you'll need to write a function that we can call plot_by_date that should look something like this:

def plot_by_date(x, y, color=None, label=None):

    ...

(I'd be more helpful on the body, but I don't actually know how to do much with dates and matplotlib). The end result is that when you call this function it should plot on the currently-active Axes. Then do

g.map(plot_by_date, "Date", "Stat")

And it should work, I think.

mwaskom
  • 46,693
  • 16
  • 125
  • 127
  • 1
    What i meant was, if I want to roll my own function, what should it return? I.e. say I want to make a stupid function that just draws a horizontal line in each facet at `y=2` and ignore all the input data. Then what would that function look like? – 8one6 Sep 06 '14 at 16:54
  • 2
    Ah, the return value is ignored....the important thing is that the function *plots*. Actually for that specific example you can just do `g.map(plt.axhline, y=2)`. Not sure if that helps your general understanding though. – mwaskom Sep 06 '14 at 17:31
  • 1
    I'll play around with it a bit. I've gotten into a bit of a groove avoiding "just plotting" things, instead preferring to do things like `ax.plot` or `df.plot(ax=ax)` to be explicit about where I want the artists to do their work. So this is a bit "against the grain" for me. But I'll give it a shot. – 8one6 Sep 08 '14 at 19:26
  • 4
    This might be useful: http://nbviewer.ipython.org/gist/mwaskom/9276378379d757fe0cc6 – mwaskom Sep 08 '14 at 19:35