1

I'm a Python newbie. I'm trying to run some descriptive stats on my DB and I succeeded in drawing

this kind of graph

Here is an example of the DB so you can figure out the data structure:

Dataframe Example

Basically I'm interested in seeing how many customers - represented by the height of each bar - have bought 1,2,3...n times (x axis) in a given product family (in the case below is "WORKWEAR & PROTECTIVE CLOTHING") . Keep in mind that family is a dig in level of the column "product basket"). The lineplot is just the cumulative percentage.

The color coding is representing the type of customer I'm analyzing (SMB or CORP).

First thing I've built the df that is calculating the cumulative percentages in a dataframe:

n_purch = df1.groupby(['product_basket','family','SF_type','orders_count']) 
['ship_to_number'].nunique().to_frame().reset_index()
n_purch.rename(columns={'ship_to_number':'ship_to_count'}, inplace = True)

n_ship = df1.groupby(['product_basket','family','SF_type']) 
['ship_to_number'].nunique().to_frame().reset_index()
n_ship.rename(columns={'ship_to_number':'ship_to_total'}, inplace = True)

#merge and keep only relevant columns from n_ship dataframe
df_perc = pd.merge(n_purch,n_ship[['SF_type','family','ship_to_total']],how='left',on= 
['SF_type','family'])

df_perc['ship_to_perc'] = df_perc['ship_to_count'] / df_perc['ship_to_total']
df_perc["running_perc"] = df_perc.groupby(['SF_type','family'])['ship_to_perc'].cumsum()

df_perc['orders_count'] = df_perc['orders_count'].astype(int)

df_perc

Second thing is drawing all these into a seaborn barplot:

df_perc2=df_perc.query('family=="WORKWEAR & PROTECTIVE CLOTHING"')

fig, ax = plt.subplots()
ax_twin = ax.twinx()

#set boxplot general aspect
fig = plt.gcf()
fig.set_size_inches(30, 10)
sns.set_style("white")

#ship_to_count bars
barplot = sns.barplot(data=df_perc2, 
                  x = 'orders_count', 
                  hue = 'SF_type', 
                  y = 'ship_to_perc',
                  palette = "Set2",
                  ax = ax)
#cumulative % line
lineplot = sns.pointplot(data = df_perc2,
                     x = 'orders_count',
                     hue = 'SF_type',
                     y = 'running_perc',
                     palette = "Set2",
                     marker ='o',
                     ax = ax_twin,
                     legend = False)

#set tick stiles for x and y axis
barplot.set_xticklabels(barplot.get_xmajorticklabels(), fontsize = 18)
barplot.set_yticklabels(barplot.get_yticks().round(2), size = 18)
lineplot.set_yticklabels(lineplot.get_yticks().round(2), size = 18)

#set dynamic title
barplot.set_title('% Ship Tos by # orders for '+''.join(df_perc2['product_basket'].unique())
              +
              ' - '
              +
              ''.join(df_perc2['family'].unique()), fontdict = { 'fontsize': 30}, y = 1.05)


barplot.get_legend().remove()

#set constant line at 90%
plt.axhline(y=0.9, color='g', ls=':', lw=4, label='90th percentile')
lineplot.legend(loc='center right',fontsize='22')

it works just fine :) !

QUESTION: is there a way for the code to show me ALL sets of barplots + lineplots for ALL the families (column "family") instead of just selecting them one by one like i did in this passage in the second piece of code:

df_perc2=df_perc.query('family=="WORKWEAR & PROTECTIVE CLOTHING"')

Basically I want to be able to select the product basket and then having returned all graphs for all the families belonging to that specific product basket.

Thanks a lot in advance for helping me.

Stefano

P.S. sorry for the long description but it was important to give you the context in order for anyone who will help me to have an easier life in answering

  • It would be better to share a sample df that reflects your real df. Currently, we have to reverse-engineer its structure. See also [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Mr. T May 31 '21 at 11:18
  • Hey Mr T, thanks for replying to me. I've inserted a link to GitHub (first time I'm using it so go easy on me), should be visible so you can tell which is the df structure: [link] (https://github.com/StefanoPuccini/Stack-Overflow) – Stefano Puccini May 31 '21 at 13:36

0 Answers0