How to make multiple scatter subplots with sharing one-axis?

Question

date	name	amount
1	harry	100
1	joe	20
2	harry	50
3	joe	60
3	lee	25
4	lee	60
4	harry	200
4	joe	90

I was trying to share 'date' axis (x-axis) with 432 person name. Image was too large to show.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dec=pd.read_csv('december.csv')
sns.lmplot(x='date', y='amount',
           data= dec, fit_reg=False, hue='name', legend=True, palette='Set1')

This code is giving one graph with 432 hue. But I want 432 graphs. How to do it?

Please see how to ask a StackOverflow Question: https://stackoverflow.com/help/how-to-ask . For `pandas` specifically, plerase include sample input data (no images), code you have tried and expected output. Please see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Jan 19 '21 at 19:12
you are going to want to use seaborn for this. Look into `catplot` (https://seaborn.pydata.org/generated/seaborn.catplot.html) or `relplot` (https://seaborn.pydata.org/generated/seaborn.relplot.html) in the seaborn library. Here is an example line of code from the documentaiton: `sns.relplot(data=tips, x="total_bill", y="tip", hue="day", col="time")` — David Erickson, Jan 19 '21 at 19:13

apaolillo · Accepted Answer · 2021-01-26T09:55:07.857

Using the same code you wrote, but instead of putting hue='name', you put col='name' and it should give you the expected behavior:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dec = pd.DataFrame(
    [
        [1,'harry',100],
        [1,'joe',20],
        [2,'harry',50],
        [3,'joe',60],
        [3,'lee',25],
        [4,'lee',60],
        [4,'harry',200],
        [4,'joe',90],        
    ],
    columns=['date','name','amount'],
)

sns.lmplot(
    x='date',
    y='amount',
    data= dec,
    fit_reg=False,
    col='name',
    legend=True,
    palette='Set1',
)

If you want to break the rows, you can define a column wrapper with col_wrap (number of plots per row):

sns.lmplot(
    x='date',
    y='amount',
    data= dec,
    fit_reg=False,
    col='name',
    col_wrap=1,
    legend=True,
    palette='Set1',
)

EDIT: using the groupby() method, you can easily get aggregates such as number of dots per plot and total amount per group. The main idea is to group the records in the dec dataframe by name (has it was implicitly done in the plot above).

Continuing on the code above, you can have a preview of the groupby operation using the describe method:

dec.groupby('name').describe()

Out[2]: 
       date                                                amount                                                          
      count      mean       std  min   25%  50%   75%  max  count        mean        std   min    25%    50%     75%    max
name                                                                                                                       
harry   3.0  2.333333  1.527525  1.0  1.50  2.0  3.00  4.0    3.0  116.666667  76.376262  50.0  75.00  100.0  150.00  200.0
joe     3.0  2.666667  1.527525  1.0  2.00  3.0  3.50  4.0    3.0   56.666667  35.118846  20.0  40.00   60.0   75.00   90.0
lee     2.0  3.500000  0.707107  3.0  3.25  3.5  3.75  4.0    2.0   42.500000  24.748737  25.0  33.75   42.5   51.25   60.0

Using the pandas groupby method, we group records by 'name' and pick any column (here: 'amount') to get the count (the count is the same aggregate for each column, as it counts each individual occurence of each different 'name'):

counts = dec.groupby('name')['amount'].count()
counts

Out[3]: 
name
harry    3
joe      3
lee      2
Name: amount, dtype: int64

To get the total amount, we do the same, we pick the 'amount' column and call the sum() method instead of the count() method:

total_amounts = dec.groupby('name')['amount'].sum()
total_amounts

Out[4]: 
name
harry    350
joe      170
lee       85
Name: amount, dtype: int64

We now have two series indexed by 'name' containing the information we want: counts and total_amounts. We're gonna use these two series to build a title for each subplot:

plot = sns.lmplot(
    x='date',
    y='amount',
    data=dec,
    fit_reg=False,
    col='name',
    legend=True,
    palette='Set1',
)

for name in plot.axes_dict:
    sublot_title = f'name = {name}, number of dots = {counts[name]}, total amount = {total_amounts[name]}'
    plot.axes_dict[name].set_title(sublot_title)

plot.fig

It prints:

Thanks! @apaolilo What if I also want to show number of counts with the name? — Shashwat Awasthi, Jan 23 '21 at 03:51
by "number of counts", do you mean the sum of the `amount` value for each name? Or the number of times it occurs in the dataframe (number of dots on each subplot)? — apaolillo, Jan 25 '21 at 10:53
@ShashwatAwasthi I edited the answer to explain how to do both — apaolillo, Jan 26 '21 at 09:55

How to make multiple scatter subplots with sharing one-axis?

1 Answers1