0

I have produced some software that is processing data for analysis and plotting. For each type of data the data frames are produced in a module dedicated for the type. Depending on the structure of the data the data frame columns could be normal or multindex. I will pass the data frames to a procedure function that will produce plots of columns that are numeric.

I would like to be able to "attach" a string to each of the "printable" column with a string that will be used as plot labels. This string will not be the same as the name of the column.

I don't seem to be able to figure out a good way to do this purely with pandas DataFrame, so far I don't have any other solution either.

I have seen posts about metadata but I don't completely understand if this functionality is supported or not? At least I don't get this to work, especially it seems like using frames with MultiIndex columns complicates things. If it is not supported is it still on the todo list? From my reading I get the impression it have worked differently in different versions of pandas and even depend on if python 2 or 3 is used. I would like to know if there is a convenient way to accomplish what I require with Pandas data frames? Is using _metadata for this advisable? If so how?

I have looked around quite a bit but especially the MultiIndex concern seems to not be addressed anywhere.

This one seem to indicate that metadata should be supported but is it for data frames? I need Series in a DataFrame. Adding meta-information/metadata to pandas DataFrame

This one seem to be a similar question but I have tried the solution and it did not help, I tried the solution but it seems not to help me. Propagate pandas series metadata through joins

Here is some experimentation I have done based on my understanding of the use of _metadata functionality. It seems to indicate that the _metadata did not make any difference and that the attribute did not persist a copy. Also it shows that using MultiIndex is an even more "unsupported" case.

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> from numpy.random import randn  # To get values for the test frames
>>> import platform  # To print python version
>>> # A function to set labels of the columns
>>> def labelSetter(aDF) :
...     DFtmp = aDF.copy()  # Just to ensure it is a different dataframe
...     for column in DFtmp.columns :
...         DFtmp[column].myLab='This is '+column.__str__()
...         DFtmp[column].notMyLab='This should not persist'
...     return DFtmp
...
>>>
>>> print 'Pandas version: {}'.format(pd.version.version)
Pandas version: 0.15.2
>>>
>>> pd.Series._metadata.append('myLab');print pd.Series._metadata # now _metadata contains 'myLab'
['name', 'myLab']
>>>
>>> # Make dataframes normal columns and MultiIndex
>>> dfS=pd.DataFrame(randn(2, 6),columns=['a1','a2','a3','b1','b2','c1']);print dfS
         a1        a2        a3        b1        b2        c1
0 -0.934869 -0.310979  0.362635 -0.994605 -0.880114 -1.663265
1  0.205341 -1.642080 -0.732969 -0.080109 -0.082483 -0.208360
>>>
>>> dfMI=pd.DataFrame(randn(2, 6),columns=[['a','a','a','b','b','c'],['a1','a2','a3','b1','b2','c1']]);print dfMI
          a                             b                   c
         a1        a2        a3        b1        b2        c1
0 -0.578399  0.478925  1.047342 -0.087225  1.905074  0.146105
1  0.640575  0.153328 -1.117847  1.043026  0.671220 -0.218550
>>>
>>> # Run the labelSetter function on the data frames
>>> dfSWlab=labelSetter(dfS)
>>> dfMIWlab=labelSetter(dfMI)
>>>
>>> print dfSWlab['a2'].myLab
This is a2
>>> # This worked
>>>
>>> print dfSWlab['a2'].notMyLab
This should not persist
>>> # 'notMyLab' has not been appended to _metadata but the label still persists.
>>>
>>> dfSWlabCopy=dfSWlab.copy() # make a copy to see if myLab persists.
>>>
>>> dfSWlabCopy['a2'].myLab
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1942, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'myLab'
>>> # 'myLab' was appended to _metadata but still did not persist the copy
>>>
>>> print dfMIWlab['a']['a2'].myLab
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1942, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'myLab'
>>> # For the MultiIndex data frame the 'myLab' is not accessible
Community
  • 1
  • 1
Wurdius
  • 33
  • 4
  • Why not pass the DataFrame and labels to the plotting function as two separate arguments? `_metadata` is still [in the developmental stages](https://github.com/pydata/pandas/issues/2485). – unutbu Mar 23 '15 at 13:28
  • Thanks for the suggestion! I suppose that is what I will have to do if there is convenient way to add this information to the data frame itself. It seems there have been discussions for a long time to have meta data support so I was hoping that I was missing something. – Wurdius Mar 23 '15 at 13:37
  • Development is done through pull requests (PRs). If a PR is made which references the issue, a link is automatically added to the [github issue page](https://github.com/pydata/pandas/issues/2485), and if the PR is accepted and closes the issue, the issue will be marked "Closed". So generally, you can monitor the status of development from the issue page. – unutbu Mar 23 '15 at 14:22
  • Good! Wish I had seen that one earlier. – Wurdius Mar 23 '15 at 19:17

0 Answers0