-1

My program imports these:

import requests
import demjson

import pandas as pd
from pandas import DataFrame

import pylab
pylab.show()

I have a dataframe which if I print out looks like this:

    Strike    COI    POI
0    50.00    927   1694
1    55.00    394   1898
2    60.00   2042   4438
3    65.00    642   3696
4    70.00   3169   3216
5    75.00   2529   3222
6    80.00   6268  14029
7    85.00   3988   6241
8    87.50    356   1516
9    90.00  15676  14345
10   92.50   1309   2498
11   95.00   3303  11391
12   97.50   1074   1472
13  100.00  64930  19513
14  105.00  10953   9286
15  110.00  19956  13008
16  115.00  13956  12932
17  120.00  23440   9240
18  125.00  12167   7467
19  130.00  23531  10168
20  135.00   9567   2637
21  140.00  18967   6854
22  145.00   7890   5176
23  150.00  21516   8079
24  155.00   3137    267
25  160.00   4115    432
26  165.00   1079    205
27  170.00   4341    785
28  175.00   6277   1631
29  180.00   1805     35
30  185.00    906    136
31  190.00   1984    377
32  195.00   3539    268

Sometimes there are zero values like this

    Strike   COI   POI
0    95.00    53   663
1   100.00    16   595
2   105.00     6   377
3   110.00    56  1217
4   115.00   174   994
5   120.00   631  3227
6   125.00   701  1031
7   130.00  2678   833
8   135.00  1921  1049
9   140.00  1238    10
10  160.00  1486     0
11  165.00  1900     0

Unfortunately sometimes the Strike is a float like this:

    Strike    COI    POI
0    34.29    476  12711
1    35.71     95   7782
2    37.14      0   7844
3    38.57      0   3640
4    40.00     93   6010
5    41.43      0   5621
6    42.86   1245  18146
7    44.29    116   6844
8    45.71    140   7099
9    47.14    500    483
10   48.57    445   3956
11   50.00   1540  22362
12   51.43    152   6366
13   52.86    131   8354
14   54.29    810   7542
15   55.71    132   9337
16   57.14  12455  15024
17   58.57    662   5245
18   60.00   1743   9116
19   61.43   1368   7236
20   62.86   1128  11890
21   64.29   4537  24204
22   65.71    766   5113
23   67.14   1859  10572
24   68.57  12407  11367
25   70.00  13263  11748
26   71.43  23400  31566
27   72.86   2784  12984
28   74.29  12679  20520
29   75.71   6932  14617
..     ...    ...    ...
63  115.00  39738  18033
64  115.71   5293   2877
65  116.43   1874   2748
66  117.14   4181   1965
67  117.86   3618   4214
68  118.57  11652   4043
69  120.00  81523  34752
70  121.43  14239   3527
71  122.86   9046   6160
72  125.00    187     88
73  125.71  22557   7381
74  128.57  11053   8163
75  130.00  74007  27825
76  131.43   6747   1951
77  132.86   7289   1383
78  134.29   5872   1380
79  135.71   4946   2047
80  137.14   5349    590
81  140.00  98310  57767
82  145.00   9857    403
83  150.00  64701   2063
84  155.00  17398   1434
85  160.00  12363   1133
86  165.00   5222    539
87  170.00   9050    918
88  175.00   9848    678
89  180.00   3408     85
90  185.00   3243    768
91  190.00   3646    419
92  195.00   4789    149

Since I want the Strikes to be the bin, I have tried to plot a histogram by saying:

df.hist(by=df.Strike)

but I either get nothing, or when I do see the system ready to plot with a bunch of little grids (I am using Spyder) I get this error before any plot. As far as I can see, all the dataframes have at least one point. The y-axis also doesn't make sense since its height appears to always be one:

Traceback (most recent call last):

  File "<ipython-input-20-6f27fa6cf56c>", line 1, in <module>
    runfile('/home/idf/goog.py', wdir='/home/idf')

  File "/home/idf/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "/home/idf/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
    builtins.execfile(filename, *where)

  File "/home/idf/goog.py", line 153, in <module>
    df.hist(by=df.Strike)

  File "/home/idf/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2740, in hist_frame
    **kwds)

  File "/home/idf/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2873, in grouped_hist
    figsize=figsize, layout=layout, rot=rot)

  File "/home/idf/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2983, in _grouped_plot
    plotf(group, ax, **kwargs)

  File "/home/idf/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2867, in plot_group
    ax.hist(group.dropna().values, bins=bins, **kwargs)

  File "/home/idf/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 5597, in hist
    raise ValueError("x must have at least one data point")

ValueError: x must have at least one data point
Ivan
  • 7,448
  • 14
  • 69
  • 134
  • 1
    Which library/package are you using? – Bhargav Rao Jun 10 '15 at 17:57
  • Have you looked at http://stackoverflow.com/questions/19584029/plotting-histograms-from-grouped-data-in-a-pandas-dataframe? He's using Pandas by his talk of dataframes, Bhargav – bbill Jun 10 '15 at 18:02
  • I am just using Ananconda. I modified the post to show the imports. – Ivan Jun 10 '15 at 18:03
  • I think you are misusing `by` there. Perhaps you want this: `df['Strike'].hist() ` – JohnE Jun 10 '15 at 20:44
  • Btw, I believe `pylab` is not recommended anymore with ipython. You probably want to replace with `import matplotlib.pyplot as plt` and `%matplotlib inline` – JohnE Jun 10 '15 at 20:59
  • @johnE that does get me further since now I have just one plot. How does it know which of the two other fields to use, COI or POI, for the y-axis? – Ivan Jun 10 '15 at 22:32
  • I get (, TypeError('coercing to Unicode: need string or buffer, float found',), ) – Ivan Jun 10 '15 at 22:32
  • histogram only has one field. The y-axis is just a count (frequency). – JohnE Jun 11 '15 at 02:28

1 Answers1

1

When you call DataFrame.hist method (i.e. pandas internal plotting function) you only need to pass a column name:

df.hist('Strike') # which is the same as df.hist(column='Strike')

To get:

enter image description here

If you would use plt.hist (directly accessing matplotlib function) then you would need to pass df.Strike.values.

Primer
  • 10,092
  • 5
  • 43
  • 55