9

I am struggling to set xlim for each histogram and create 1 column of graphs so the x-axis ticks are aligned. Being new pandas, I am unsure of how to apply answer applies: Overlaying multiple histograms using pandas.

>import from pandas import DataFrame, read_csv
>import matplotlib.pyplot as plt
>import pandas as pd

>df=DataFrame({'score0':[0.047771,0.044174,0.044169,0.042892,0.036862,0.036684,0.036451,0.035530,0.034657,0.033666],
              'score1':[0.061010,0.054999,0.048395,0.048327,0.047784,0.047387,0.045950,0.045707,0.043294,0.042243]})

>print df
     score0    score1
0  0.047771  0.061010
1  0.044174  0.054999
2  0.044169  0.048395
3  0.042892  0.048327
4  0.036862  0.047784
5  0.036684  0.047387
6  0.036451  0.045950
7  0.035530  0.045707
8  0.034657  0.043294
9  0.033666  0.042243

>df.hist()
>plt.xlim(-1.0,1.0)

The result sets only one of the bounds on the x-axis to be [-1,1].

I'm very familiar ggplot in R and just trying out pandas/matplotlib in python. I'm open to suggestions for better plotting ideas. Any help would be greatly appreciated.

enter image description here

update #1 (@ct-zhu):

I have tried the following, but the xlim edit on the subplot does not seem to translate the bin widths across the new x-axis values. As a result, the graph now has odd bin widths and still has more than one column of graphs:

for array in df.hist(bins=10):
    for subplot in array:
        subplot.set_xlim((-1,1))

enter image description here

update #2:

Getting closer with the use of layout, but the width of bins does not equal the interval length divided by bin count. In the example below, I set bins=10. Hence, the width of each bin over the interval from [-1,1] should be 2/10=0.20; however, the graph does not have any bins with a width of 0.20.

for array in df.hist(layout=(2,1),bins=10):
    for subplot in array:
        subplot.set_xlim((-1,1))

enter image description here

Community
  • 1
  • 1
blehman
  • 1,870
  • 7
  • 28
  • 39

1 Answers1

13

There are two subplots, and you can access each of them and modify them seperately:

ax_list=df.hist()
ax_list[0][0].set_xlim((0,1))
ax_list[0][1].set_xlim((0.01, 0.07))

enter image description here

What you are doing, by plt.xlim, changes the limit of the current working axis only. In this case, it is the second plot which is the most recently generated.


Edit:

To make the plots into 2 rows 1 column, use layout argument. To make the bin edges aligns, use bins argument. Set the x limit to (-1, 1) is probably not a good idea, you numbers are all smallish.

ax_list=df.hist(layout=(2,1),bins=np.histogram(df.values.ravel())[1])
ax_list[0][0].set_xlim((0.01, 0.07))
ax_list[1][0].set_xlim((0.01, 0.07))

enter image description here

Or specify exactly 10 bins between (-1,1):

ax_list=df.hist(layout=(2,1),bins=np.linspace(-1,1,10))
ax_list[0][0].set_xlim((-1,1))
ax_list[1][0].set_xlim((-1,1))

enter image description here

Community
  • 1
  • 1
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • The subplot structure is very helpful, but the graphs are still in two columns. Also, what's going on with the bin width? See my update. – blehman Jun 01 '14 at 20:27
  • The layout option is very helpful. However, the bin setting seems to only apply to the range of data not the entire interval that we display using xlim. For example, say that I'd like to bucket the counts over [-1,1] with a total of 10 buckets; then the values from 0 to 0.2 should be in a single bucket, but that's not the case with bins=10. Any idea why not? – blehman Jun 02 '14 at 15:12
  • By the way, I placed in an update to clarify the question above. – blehman Jun 02 '14 at 15:29
  • 1
    Ok, I see. Use `bins=np.linspace(-1,1,11)` for the bin edges. – CT Zhu Jun 02 '14 at 15:48
  • So `bins` is not just looking for a number but rather a set of breakpoints. Can you add this solution in another edit so I can check this off as answered? – blehman Jun 03 '14 at 05:22
  • Yeah, you can use both, sort of like the `breaks` argument in `R` `hist`. – CT Zhu Jun 03 '14 at 05:29