0

I have a multivariate dataset that looks like this: enter image description here

My goal is to generate a boxplot to visualize the distribution of values in Treat1, Treat2 , Treat3 and Treat4.

I can get to a barchart to get exactly what I want based on How to add group labels for bar charts in matplotlib? enter image description here

However my requirement is for box plot to look at the distribution between mean and outliers for each Treat group. I am pasting the code again that generates the bar graph that is based on the stackoverflow code https://stackoverflow.com/users/2846871/varicus

df = df2.groupby(['Group','Category','Day ']).sum()
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
df.plot(kind='bar',stacked=False,ax=fig.gca())
labels = ['' for item in ax.get_xticklabels()]
ax.set_xticklabels(labels)
ax.set_xlabel('')
label_group_bar_table(ax, df)
fig.subplots_adjust(bottom=.1*df.index.nlevels)
plt.show()

What would be the best way to generate a Boxplot instead of each bar.

Pearl
  • 87
  • 5

2 Answers2

0

May be this is what you are looking for.

# I create a dataframe similar to yours for others to give other solutions.

group = [1,1,1,1,2,2,4,4, 2, 2, 3, 2, 2, 4, 4, 5, 5, 5, 5, 5, 5, 1, 1, 1, 1, 1,1]
cateogry = [0,0,1,1,0,0,0,0,1,1,1,1,2,2,2,2,3,3,0,0,1,1,0,0,2,2,2]
Treat1 = np.random.randint(0, 100, size= len(group))
Treat2 = np.random.randint(0, 100, size= len(group))
Treat3 = np.random.randint(0, 100, size= len(group))
Treat4 = np.random.randint(0, 100, size= len(group))
df = pd.DataFrame.from_dict({'Group': group, "Category": cateogry, "Treat1": Treat1,"Treat2": Treat2,"Treat3": Treat3,"Treat4": Treat4, })

f = df.boxplot(by = ['Group','Category'],figsize = (12,8))

will result in enter image description here

plasmon360
  • 4,109
  • 1
  • 16
  • 19
  • This graph doesnt take into account column A which is'day side by side . so that I can compare how Group values were on Day 0 compared to day 20 – Pearl Feb 12 '19 at 22:50
0

I added part of the code which does the job however I wanted that each category had it's own face color. I can't seem to distinguish between categories through graph.

fig, ax = plt.subplots(figsize = (20,8))


#Note showfliers=False is more readable, but requires a recent version iirc
bp = df_Dummy.boxplot(by = ['Group','Category','Day '],ax=ax, 
sym='',rot=90,return_type='dict',patch_artist=False)


[[item.set_linewidth(2) for item in bp[key]['boxes']] for key in bp.keys()]


[[item.set_linewidth(2) for item in bp[key]['fliers']] for key in bp.keys()]
[[item.set_linewidth(2) for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_linewidth(2) for item in bp[key]['means']] for key in bp.keys()]
[[item.set_linewidth(2) for item in bp[key]['whiskers']] for key in bp.keys()]
[[item.set_linewidth(2) for item in bp[key]['caps']] for key in bp.keys()]
colors = ['pink', 'lightblue', 'lightgreen','yellow']
[[item.set_color in zip(colors) for item in bp[key]['boxes']] for key in bp.keys()]
# seems to have no effect
[[item.set_color('b') for item in bp[key]['fliers']] for key in bp.keys()]
[[item.set_color('m') for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_markerfacecolor('k') for item in bp[key]['means']] for key in bp.keys()]
[[item.set_color('c') for item in bp[key]['whiskers']] for key in bp.keys()]
[[item.set_color('y') for item in bp[key]['caps']] for key in bp.keys()]

ax.margins(y=0.05)

enter image description here

Pearl
  • 87
  • 5