Python - Pandas: issues with multi-indexing for PanelOLS and Fama-Macbeth

Question

I have the following dataframe df:

                                 Y            X
2011-01-26 14:00:30      1      -0.0174     -0.2139
                         2       0.1234      0.1357
                         3       0.3652      0.7352
                         4       0.1111      0.3481
2011-01-26 14:01:30      1      -0.0124     -0.7139
                         2       0.1444      0.1217
                         3       0.3112      0.8882
                         4       0.1222      0.9911
.... (one minute increments)
.... (following day)
2011-01-27 14:00:30      1      -0.0884     -0.2144
                         2       0.1834      0.2227
                         3       0.3699      0.7555
                         4       0.2311      0.3481
2011-01-27 14:01:30      1      -0.0333     -0.7139
                         2       0.1444      0.1217
                         3       0.3443      0.1182
                         4       0.1442      0.9111
....

df.index
Out[38]: 

    MultiIndex(levels=[[2011-01-26 14:00:30, 2011-01-26 14:01:30, 2011-01-26 14:02:30, 2011-01-26 14:03:30, ....], [u'1', u'2', u'3',....]

If I run the following panel/Fama-Macbeth regression:

reg = pd.fama_macbeth(y=df.Y, x=df.X)

I get this error message:

    raise ValueError("Can't convert non-uniquely indexed "

ValueError: Can't convert non-uniquely indexed DataFrame to Panel

Why is that?

I am guessing it has to do with the labeling of my multi-index I believe... My multi-index was created using:

arrays = [np.array(pd.to_datetime(df['Col1'])), np.array(df['Col2'])]
df.index = arrays

What should I do to fix this?

As the error message suggests: make your index unique. You can do this either by dropping the duplicates or adding more information to it such that each row is unambiguously defined but its label. — Paul H, Jul 31 '15 at 07:39
@PaulH I am not sure if I understand. You are suggesting that I may have a duplicate in my index, for instance under a particular time, I have the same ID that is repeated more than once? — Plug4, Jul 31 '15 at 22:35
ohh... if I do df.drop_duplicates(), then run the regression it works. Ah well I was having duplicates... thanks @PaulH — Plug4, Jul 31 '15 at 22:35
Yes. A non-unique index implies that two or more rows have identical labels. — Paul H, Jul 31 '15 at 22:36
I'm having the same problem but even with ``drop_duplicates()`` I can't get it to work. http://stackoverflow.com/questions/37260035/converting-pandas-dataframe-format-to-use-panelols-cant-convert-non-uniquely — pceccon, May 16 '16 at 17:59

Python - Pandas: issues with multi-indexing for PanelOLS and Fama-Macbeth

0 Answers0