0

I have the following data frame:

    strterminationreason    total_trials    %Trials
0   Completed, Negative outcome/primary endpoint(s...   3130    6.390624
1   Completed, Outcome indeterminate    3488    7.121565
2   Completed, Outcome unknown  6483    13.236555
3   Completed, Positive outcome/primary endpoint(s...   15036   30.699498
4   Terminated, Business decision - Drug strategy ...   526 1.073952
5   Terminated, Business decision - Other   1340    2.735922
6   Terminated, Business decision - Pipeline repri...   1891    3.860917
7   Terminated, Early positive outcome  231 0.471640
8   Terminated, Lack of efficacy    1621    3.309649
9   Terminated, Lack of funding 533 1.088244
10  Terminated, Other   1253    2.558291
11  Terminated, Planned but never initiated 4441    9.067336
12  Terminated, Poor enrollment 3201    6.535587
13  Terminated, Safety/adverse effects  993 2.027441
14  Terminated, Unknown 4811    9.82277

I used the following code to plot a bar graph , horizontally since normal one doesn't fit the text codes above.

df['%Trials']=(df.ix[:,1]/sum(df.ix[:,1]))*100

plt.figure(figsize=(35,20))
plt.barh(df.ix[:,2],df.index,align='edge')
plt.xlim([0,31])
plt.yticks(df.index, df.strterminationreason)
plt.ylabel("TerminationReason",fontsize=20)
plt.xlabel("%Trials",fontsize=20)

But I get the output where the range of the bars doesn't reflect the actual % value in the dataframe. Like the highest % is for Completed, Positive outcome/primary endpoint but it doesn't show the same. Any idea why?

enter image description here

Also do someone know how to correctly fit the text under each bar so that there is no overlapping and is clean.

Baktaawar
  • 7,086
  • 24
  • 81
  • 149
  • Please check the sequence of arguments to [`barh`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.barh): `matplotlib.pyplot.barh(bottom, width, height=0.8, left=None, hold=None, **kwargs)` -- `bottom` is y-coordinate of your bars, `width` is width of bars on the x-axis – Maksim Yegorov Sep 11 '15 at 23:09
  • Didn't understand. Could you help? – Baktaawar Sep 11 '15 at 23:11

1 Answers1

0

The reason your plot appears incorrect is that you're passing arguments to barh in reversed sequence. You can find the documentation for matplotlib.pyplot.barh here. Here's a slightly modified script that resolves your problem:

bottom = range(len(df.index))
width = df['%Trials']
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)

ax.barh(bottom, width,color='r',align='edge')
ax.set_yticks(y_pos)
ax.set_yticklabels(df.index)

plt.show()

Regarding fitting your long labels, you may want to tweak the font, padding or wrap the lines for readability. See here and here.

Community
  • 1
  • 1