I have a large dataframe where I store various metadata in a multiindex (see also here).
Essentially my dataframe looks like this:
location zero A B C and so on
type zero MUR RHE DUJ RHE RHE
name zero foo bar baz boo far
1930-03-01 0 2.1 3.4 9.4 5.4 5.5
1930-04-01 0 3.1 3.6 7.3 6.7 9.5
1930-05-01 0 2.5 9.1 8.0 1.1 8.1
and so on
So that I can easily select for example all DUJ
datatypes with mydf.xs('DUJ', level = 'type', axis = 1)
.
But how can I access the strings in the type
index and eliminate doubles and maybe get some statictics?
I am looking for an output like
types('MUR', 'RHE', 'DUJ')
and/or
types:
DUJ 1
MUR 1
RHE 3
giving me a list of the datatypes and how often they occur.
I can access the index with
[In]mytypes = mydf.columns.get_level_values(1)
[In]mytypes
[Out]Index([u'zero', u'MUR', u'RHE', u'DUJ', u'RHE', u'RHE'], dtype='object')
but I cant think of any easy way to do something with this information, especially considering that my real dataset will return 1500 entries. My first idea was a simple mytypes.sort()
but apparently I Cannot sort an 'Index' object
.
Being able to describe your dataset seems like a rather important thing to me, so I would expect that there is something built in in pandas, but I cant seem to find it. And the MultiIndex documentation seems only to be concerned with constructing and setting indexes, but not analyzing them.