1

Pandas documentation has given following code, which works fine:

 frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
     index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
     columns=[['Ohio', 'Ohio', 'Colorado'],
     ['Green', 'Red', 'Green']])

I tried following code, based on above concept, but it does not work:

hi5 = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], 
    index = [['a','a','a','b'],[1,2,3,1]], 
    columns=[['Ohio', 'Ohio', 'Colorado'], 
    ['Green', 'Red', 'Green']])

It is giving Following error for above code:

AssertionError: 2 columns passed, passed data had 3 columns
Kiran
  • 2,147
  • 6
  • 18
  • 35
  • Possible duplicate of [Panda AssertionError columns passed, passed data had 2 columns](https://stackoverflow.com/questions/38927230/panda-assertionerror-columns-passed-passed-data-had-2-columns) – Tarun Kolla Sep 05 '19 at 03:31
  • @TarunKolla that question talks about one array of column which works great. But in this question, I am talking about two array of columns which is not working. – Kiran Sep 05 '19 at 03:36
  • Possible duplicate of [Python pandas Multindex column](https://stackoverflow.com/questions/40182072/python-pandas-multindex-column) – Craig Sep 05 '19 at 03:48
  • The answer on that post indicates that this has been reported as inconsistent behavior and is still an open issue: https://stackoverflow.com/a/40182863/7517724 – Craig Sep 05 '19 at 03:52

2 Answers2

1

Apparently, you will need to use a pd.DataFrame.from_records constructor for that

>>> hi5 = pd.DataFrame.from_records([[1,2,3],[4,5,6],[7,8,9],[10,11,12]],
...     index = [['a','a','a','b'],[1,2,3,1]],
...     columns=[['Ohio', 'Ohio', 'Colorado'],
...     ['Green', 'Red', 'Green']])
>>>
>>> hi5
     Ohio     Colorado
    Green Red    Green
a 1     1   2        3
  2     4   5        6
  3     7   8        9
b 1    10  11       12

I can only guess that list of lists does not have a shape property, thus generic constructor does not support such type of data.

crayxt
  • 2,367
  • 2
  • 12
  • 17
1

I think it goes deeper:

The code below works:

frame = pd.DataFrame(np.array([[ 0,  1,  2], [ 3,  4,  5], [ 6,  7,  8], [ 9, 10, 11]]),
index=[['a', 'a', 'a', 'b'], [1, 2, 3, 1]],
columns=[['Ohio', 'Ohio', 'Colorado'],
['Green', 'Red', 'Green']])

The code below doesn't work, the only difference is we're now passing a list instead of an array:

frame = pd.DataFrame([[ 0,  1,  2], [ 3,  4,  5], [ 6,  7,  8], [ 9, 10, 11]],
index=[['a', 'a', 'a', 'b'], [1, 2, 3, 1]],
columns=[['Ohio', 'Ohio', 'Colorado'],
['Green', 'Red', 'Green']])

AssertionError: 2 columns passed, passed data had 3 columns

However, if we split this code up into steps, it works again.

frame = pd.DataFrame([[ 0,  1,  2], [ 3,  4,  5], [ 6,  7,  8], [ 9, 10, 11]])
frame.index=[['a', 'a', 'a', 'b'], [1, 2, 3, 1]]
frame.columns=[['Ohio', 'Ohio', 'Colorado'],
['Green', 'Red', 'Green']]
KWx
  • 310
  • 1
  • 10