Data
Below is the data frame I wish to represent as a histogram, with each row as a point. This won't be interesting since this will give me three bins of equal size. That's ok for now, so read on!
>>> outer_df
patient cell product
0 Pat_1 22RV1_PROSTATE 12
1 Pat_1 DU145_PROSTATE 15
2 Pat_1 LN18_CENTRAL_NERVOUS_SYSTEM 9
3 Pat_2 22RV1_PROSTATE 12
4 Pat_2 DU145_PROSTATE 15
5 Pat_2 LN18_CENTRAL_NERVOUS_SYSTEM 9
6 Pat_3 22RV1_PROSTATE 12
7 Pat_3 DU145_PROSTATE 15
8 Pat_3 LN18_CENTRAL_NERVOUS_SYSTEM 9
Desired Result
Graph each row as a point on a histogram, but also be able to pick out a particular set of data (eg all points from all cells would be in purple below, those belonging to justDU145_PROSTATE
would be in red, and 22RV1_PROSTATE
in blue) and graph this as an overlaid histogram. I've illustrated this with a graphic from the pandas docs:
Attempt 1
I first tried to use the hist
method for DataFrames, but encountered an error, and a blank 4x4 series of histograms.
>>> outer_df.hist()
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1977, in hist_frame
ax.hist(data[col].dropna().values, **kwds)
File "/usr/lib/python3/dist-packages/matplotlib/axes.py", line 8099, in hist
xmin = min(xmin, xi.min())
TypeError: unorderable types: str() < float()
Attempt 2
Realizing DataFrame.hist()
"plots the histograms of the columns on multiple subplots", moved away from this and tried outer_df.plot(kind='hist', stacked=True)
. Even though I took this directly from the docs, I'm stuck on this error:
>>> outer_df.plot(kind='hist', stacked=True)
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist
Attempt 3 -- courtesy of @816
>>> outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist