I'm a Python newbie. I'm trying to run some descriptive stats on my DB and I succeeded in drawing
Here is an example of the DB so you can figure out the data structure:
Basically I'm interested in seeing how many customers - represented by the height of each bar - have bought 1,2,3...n times (x axis) in a given product family (in the case below is "WORKWEAR & PROTECTIVE CLOTHING") . Keep in mind that family is a dig in level of the column "product basket"). The lineplot is just the cumulative percentage.
The color coding is representing the type of customer I'm analyzing (SMB or CORP).
First thing I've built the df that is calculating the cumulative percentages in a dataframe:
n_purch = df1.groupby(['product_basket','family','SF_type','orders_count'])
['ship_to_number'].nunique().to_frame().reset_index()
n_purch.rename(columns={'ship_to_number':'ship_to_count'}, inplace = True)
n_ship = df1.groupby(['product_basket','family','SF_type'])
['ship_to_number'].nunique().to_frame().reset_index()
n_ship.rename(columns={'ship_to_number':'ship_to_total'}, inplace = True)
#merge and keep only relevant columns from n_ship dataframe
df_perc = pd.merge(n_purch,n_ship[['SF_type','family','ship_to_total']],how='left',on=
['SF_type','family'])
df_perc['ship_to_perc'] = df_perc['ship_to_count'] / df_perc['ship_to_total']
df_perc["running_perc"] = df_perc.groupby(['SF_type','family'])['ship_to_perc'].cumsum()
df_perc['orders_count'] = df_perc['orders_count'].astype(int)
df_perc
Second thing is drawing all these into a seaborn barplot:
df_perc2=df_perc.query('family=="WORKWEAR & PROTECTIVE CLOTHING"')
fig, ax = plt.subplots()
ax_twin = ax.twinx()
#set boxplot general aspect
fig = plt.gcf()
fig.set_size_inches(30, 10)
sns.set_style("white")
#ship_to_count bars
barplot = sns.barplot(data=df_perc2,
x = 'orders_count',
hue = 'SF_type',
y = 'ship_to_perc',
palette = "Set2",
ax = ax)
#cumulative % line
lineplot = sns.pointplot(data = df_perc2,
x = 'orders_count',
hue = 'SF_type',
y = 'running_perc',
palette = "Set2",
marker ='o',
ax = ax_twin,
legend = False)
#set tick stiles for x and y axis
barplot.set_xticklabels(barplot.get_xmajorticklabels(), fontsize = 18)
barplot.set_yticklabels(barplot.get_yticks().round(2), size = 18)
lineplot.set_yticklabels(lineplot.get_yticks().round(2), size = 18)
#set dynamic title
barplot.set_title('% Ship Tos by # orders for '+''.join(df_perc2['product_basket'].unique())
+
' - '
+
''.join(df_perc2['family'].unique()), fontdict = { 'fontsize': 30}, y = 1.05)
barplot.get_legend().remove()
#set constant line at 90%
plt.axhline(y=0.9, color='g', ls=':', lw=4, label='90th percentile')
lineplot.legend(loc='center right',fontsize='22')
it works just fine :) !
QUESTION: is there a way for the code to show me ALL sets of barplots + lineplots for ALL the families (column "family") instead of just selecting them one by one like i did in this passage in the second piece of code:
df_perc2=df_perc.query('family=="WORKWEAR & PROTECTIVE CLOTHING"')
Basically I want to be able to select the product basket and then having returned all graphs for all the families belonging to that specific product basket.
Thanks a lot in advance for helping me.
Stefano
P.S. sorry for the long description but it was important to give you the context in order for anyone who will help me to have an easier life in answering