0

I am trying to avoid dead spaces in y axis in a stacked bar chart generated via cufflinks [plotly]

the data looks like this :

    delay_percentage
crane_delay_type_gkey   1.0      2.0      3.0        4.0         5.0       6.0  7.0 8.0 9.0 10.0    ... 18.0     19.0   20.0    21.0    22.0    23.0    24.0    25.0    26.0    27.0
  crane_gkey                                                                                    
         288     76.425626  1.846134    0.000000    0.701747    0.000000     0.000000   4.933820    0.939261    0.000000    0.000000    ... 1.338717     0.291495   0.421048    0.269903    0.151145    0.636970    6.395612    1.589187    0.000000    0.172738
         333    46.153846   0.000000    0.000000    0.000000    0.000000    0.000000    7.692308    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
         338    81.818182   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
         345    75.000000   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    12.500000   0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000

code i used for cufflinks :

df.iplot(kind ='barh', barmode = 'stack')

the plot looks like this :

enter image description here

How do i remove the spaces between the bars? especially the big gap between y axis value 288 and 333.

I have tried making the crane_gkey values[y axis values] into a string, it did not do anything. Also how would i increase the thickness of the bars in a cufflinks bar chart.

Aasheet Kumar
  • 341
  • 1
  • 2
  • 9

1 Answers1

1

Why not just cut off the null values at the source. I mean, using pandas itself.

So here is my approach to this.

We have a sample dataframe.

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 0, 0, 4, 5, 6, 7]})

Which on pivot gives me.

table = pd.pivot_table(df, values='D', index=['A', 'B'],
                     columns=['C'], aggfunc=np.sum)

Reference: here

Output:

C       large   small
A   B       
bar one 4.0     5.0
    two 7.0     6.0
foo one 4.0     1.0
    two NaN     0.0

So if we remove foo and two we can get the correct plot. I do this by using.

table = table.fillna(0) # replace all NaN values to zero
table = table[(table.T != 0).any()] # remove all the rows which are having sum as zero.

Output:

C       large   small
A   B       
bar one 4.0     5.0
    two 7.0     6.0
foo one 4.0     1.0

Finally we can plot using cufflinks by

plot = table.iplot(kind ='barh', barmode = 'stack', asFigure=True)
py_offline.iplot(plot)

Please try out this solution and let me know if this solves your issue!

Naren Murali
  • 19,250
  • 3
  • 27
  • 54
  • sorry for the late reply, but yeah that would work. I got it to work after i changed the y axis values to a string and added 'crane' before the number. thank you for the answer ! – Aasheet Kumar Jul 11 '18 at 15:43
  • also there were no values in the y axis between 288 and 333, it just assumed there were and simply changing the column to a string didnt help. So i added a string value to the number and that seemed to fix it. – Aasheet Kumar Jul 11 '18 at 15:45
  • @AasheetKumar Glad to help, post your solution as the answer to your question! – Naren Murali Jul 11 '18 at 16:15