Pandas Multi-Index - Can't convert non-uniquely indexed DataFrame to Panel

Question

Given a time series data, I'm trying to use panel OLS with fixed effects in Python. I found this way to do it:

My input data looks like this (I will called it df):

    Name    Permits_13  Score_13    Permits_14  Score_14    Permits_15  Score_15
0   P.S. 015 ROBERTO CLEMENTE   12.0    284 22  279 32  283
1   P.S. 019 ASHER LEVY 18.0    296 51  301 55  308
2   P.S. 020 ANNA SILVER    9.0 294 9   290 10  293
3   P.S. 034 FRANKLIN D. ROOSEVELT  3.0 294 4   292 1   296
4   P.S. 064 ROBERT SIMON   3.0 287 15  288 17  291
5   P.S. 110 FLORENCE NIGHTINGALE   0.0 313 3   306 4   308
6   P.S. 134 HENRIETTA SZOLD    4.0 290 12  292 17  288
7   P.S. 137 JOHN L. BERNSTEIN  4.0 276 12  273 17  274
8   P.S. 140 NATHAN STRAUS  13.0    282 37  284 59  284
9   P.S. 142 AMALIA CASTRO  7.0 290 15  285 25  284
10  P.S. 184M SHUANG WEN    5.0 327 12  327 9   327

So first I have to transform it to Multi-index (_13, _14, _15 represent data from 2013, 2014 and 2015, in that order):

df = df.dropna()
df = df.drop_duplicates()

rng = pandas.date_range(start=pandas.datetime(2013, 1, 1), periods=3, freq='A')
index = pandas.MultiIndex.from_product([rng, df['Name']], names=['date', 'id'])
d1 = numpy.array(df.ix[:, ['Score_13', 'Permits_13']])
d2 = numpy.array(df.ix[:, ['Score_14', 'Permits_14']])
d3 = numpy.array(df.ix[:, ['Score_15', 'Permits_15']])
data = numpy.concatenate((d1, d2, d3), axis=0)
s = pandas.DataFrame(data, index=index, columns=['y', 'x']) 
s = s.drop_duplicates()

Which results in something like this:

        y   x
date    id      
2013-12-31  P.S. 015 ROBERTO CLEMENTE   284 12
P.S. 019 ASHER LEVY 296 18
P.S. 020 ANNA SILVER    294 9
P.S. 034 FRANKLIN D. ROOSEVELT  294 3
P.S. 064 ROBERT SIMON   287 3
P.S. 110 FLORENCE NIGHTINGALE   313 0
P.S. 134 HENRIETTA SZOLD    290 4
P.S. 137 JOHN L. BERNSTEIN  276 4
P.S. 140 NATHAN STRAUS  282 13
P.S. 142 AMALIA CASTRO  290 7
P.S. 184M SHUANG WEN    327 5
P.S. 188 THE ISLAND SCHOOL  279 4
HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES   255 4
TECHNOLOGY, ARTS, AND SCIENCES STUDIO   282 18
THE EAST VILLAGE COMMUNITY SCHOOL   306 35
UNIVERSITY NEIGHBORHOOD MIDDLE SCHOOL   277 4
THE CHILDREN'S WORKSHOP SCHOOL  302 35
NEIGHBORHOOD SCHOOL 299 15
EARTH SCHOOL    305 3
SCHOOL FOR GLOBAL LEADERS   286 15
TOMPKINS SQUARE MIDDLE SCHOOL   306 3
P.S. 001 ALFRED E. SMITH    303 20
P.S. 002 MEYER LONDON   306 8
P.S. 003 CHARRETTE SCHOOL   325 62
P.S. 006 LILLIE D. BLAKE    333 89
P.S. 011 WILLIAM T. HARRIS  320 30
P.S. 033 CHELSEA PREP   313 5
P.S. 040 AUGUSTUS SAINT-GAUDENS 326 23
P.S. 041 GREENWICH VILLAGE  326 25
P.S. 042 BENJAMIN ALTMAN    314 30
... ... ... ...
2015-12-31  P.S. 054 CHARLES W. LENG    309 2
P.S. 055 HENRY M. BOEHM 311 3
P.S. 56 THE LOUIS DESARIO SCHOOL    323 4
P.S. 057 HUBERT H. HUMPHREY 287 2
SPACE SHUTTLE COLUMBIA SCHOOL   307 0
P.S. 060 ALICE AUSTEN   303 1
I.S. 061 WILLIAM A MORRIS   291 2
MARSH AVENUE SCHOOL FOR EXPEDITIONARY LEARNING  316 0
P.S. 069 DANIEL D. TOMPKINS 307 2
I.S. 072 ROCCO LAURIE   308 1
I.S. 075 FRANK D. PAULO 318 9
THE MICHAEL J. PETRIDES SCHOOL  310 0
STATEN ISLAND SCHOOL OF CIVIC LEADERSHIP    309 0
P.S. 075 MAYDA CORTIELLA    282 19
P.S. 086 THE IRVINGTON  286 38
P.S. 106 EDWARD EVERETT HALE    280 27
P.S. 116 ELIZABETH L FARRELL    291 3
P.S. 123 SUYDAM 287 14
P.S. 145 ANDREW JACKSON 285 4
P.S. 151 LYNDON B. JOHNSON  271 27
J.H.S. 162 THE WILLOUGHBY   283 22
P.S. 274 KOSCIUSKO  282 2
J.H.S. 291 ROLAND HAYES 279 13
P.S. 299 THOMAS WARREN FIELD    288 5
I.S. 347 SCHOOL OF HUMANITIES   284 45
I.S. 349 MATH, SCIENCE & TECH.  285 45
P.S. 376    301 9
P.S. 377 ALEJANDRINA B. DE GAUTIER  277 3
P.S. /I.S. 384 FRANCES E. CARTER    291 4
ALL CITY LEADERSHIP SECONDARY SCHOOL    325 18

However, when I try to call:

reg  = PanelOLS(y=s['y'],x=s[['x']],time_effects=True)

I get an error:

ValueError: Can't convert non-uniquely indexed DataFrame to Panel

That's my first time using Pandas, this may be a simple question but I don't know what's the problem. As far as I got I have a multi-index object as required.

I don't get why I have duplicates (I put a lot of drop_duplicates() try to get rid of any duplicated data -- which I don't think is the answer, though). If I have data for the same school for three years, shouldn't I have duplicate data somehow (looking just at the row Name, for example)?

EDIT

dfis 935 rows × 7 columns, after getting rid of NaNs rows. So I expected s to be 2805 rows × 2 columns, which is exactly what I have.

If i run this:

s = s.reset_index().groupby(s.index.names).first()
reg  = PanelOLS(y=s['y'],x=s[['x']],time_effects=True)

I get another error:

ValueError: operands could not be broadcast together with shapes (2763,) (3,)

Thank you.

piRSquared · Accepted Answer · 2016-05-16T18:52:54.807

1

Using the provided pickle file, I ran the regression and it worked fine.

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x>

Number of Observations:         2763
Number of Degrees of Freedom:   4

R-squared:         0.0268
Adj R-squared:     0.0257

Rmse:             16.4732

F-stat (1, 2759):    25.3204, p-value:     0.0000

Degrees of Freedom: model 3, resid 2759

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.1666     0.0191       8.72     0.0000     0.1292     0.2041
---------------------------------End of Summary---------------------------------

I ran this in Jupyter Notebook

edited May 16 '16 at 18:52

answered May 16 '16 at 18:07

piRSquared

285,575
57
475
624

Thank you for your answer, @piRSquared. I'd like to ask you why this happen, if I get rid of duplicates at the beginning of the code (before the transformation). With this code, I then get ``NotImplementedError: Only 2-level MultiIndex are supported``. – pceccon May 16 '16 at 18:10
`drop_duplicates` is checking for duplicate rows of data and not duplicate index values. You are getting your error because of duplicate index values. In order to solve your problem, you need to get rid of duplicate index values. `reset_index` puts your index back into the data portion of your dataframe and is then subject to the `drop_duplicates` method. – piRSquared May 16 '16 at 18:17
Here I get, now: ``AttributeError: 'NoneType' object has no attribute 'conjugate'``. I put an example of the ``s`` data frame here: https://dl.dropboxusercontent.com/u/37155213/Sample.pkl. I ran ``s = s.astype(np.float)`` before calling the PanelOLS method. Thank you for your patience. – pceccon May 16 '16 at 18:47
It is weird, @piRSquared. I could run in a example that I also put in my Public folder mentioned before, but not through Jupyter Notebook. – pceccon May 16 '16 at 18:51

Pandas Multi-Index - Can't convert non-uniquely indexed DataFrame to Panel

1 Answers1

Linked