Python pandas Multindex column

Question

First of all, I'm using python 3.50 in a jupyter notebook.

I want to create a DataFrame for showing some data in a report. I want it to have two index column (Excuse me if the term to refer it is not the correct. I'm not use to work with pandas).

I have this example code that works:

frame = pd.DataFrame(np.arange(12).reshape(( 4, 3)), 
                  index =[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], 
                  columns =[['Ohio', 'Ohio', 'Ohio'], ['Green', 'Red', 'Green']])

But when I try to take it to my case, it gives me an error:

cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753], 
                             [0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
                             [0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
                             [-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
                             [-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]], 
                            index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
                                   [0.0005, 0.001, 0.01, 0.1, 0.5]],
                            columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
cell_rise_Inv

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-89-180a1ad88403> in <module>()
      6                             index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
      7                                    [0.0005, 0.001, 0.01, 0.1, 0.5]],
----> 8                             columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
      9 cell_rise_Inv

C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    261                     if com.is_named_tuple(data[0]) and columns is None:
    262                         columns = data[0]._fields
--> 263                     arrays, columns = _to_arrays(data, columns, dtype=dtype)
    264                     columns = _ensure_index(columns)
    265 

C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _to_arrays(data, columns, coerce_float, dtype)
   5350     if isinstance(data[0], (list, tuple)):
   5351         return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5352                                dtype=dtype)
   5353     elif isinstance(data[0], collections.Mapping):
   5354         return _list_of_dict_to_arrays(data, columns,

C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _list_to_arrays(data, columns, coerce_float, dtype)
   5429         content = list(lib.to_object_array(data).T)
   5430     return _convert_object_array(content, columns, dtype=dtype,
-> 5431                                  coerce_float=coerce_float)
   5432 
   5433 

C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _convert_object_array(content, columns, coerce_float, dtype)
   5487             # caller's responsibility to check for this...
   5488             raise AssertionError('%d columns passed, passed data had %s '
-> 5489                                  'columns' % (len(columns), len(content)))
   5490 
   5491     # provide soft conversion of object dtypes

AssertionError: 2 columns passed, passed data had 5 columns

Any ideas? I can't understand why the example works and mine don't do it. :S

Thank you in advance :).

The error indicates that you are not passing in data of a shape that matches the index: AssertionError: 2 columns passed, passed data had 5 columns — Mad Physicist, Oct 21 '16 at 17:32
It looks like your index repeats `'capacitance [pF]'` 5 times, while the data only has two columns... — Mad Physicist, Oct 21 '16 at 17:33
Also, you may want to switch the order of the label (`'capacitance [pF]'`) and the numbers in the multi-index. — Mad Physicist, Oct 21 '16 at 17:37
What do you mean with "Can you show the actual line you are running, with all the inputs?"? — JoseleMG, Oct 21 '16 at 17:43
Sorry, I misread your output as all being part of the error. Didn't realize you had the input at the top. I feel like an idiot. — Mad Physicist, Oct 21 '16 at 17:47
Looks like a real bug. I filed a request with pandas: https://github.com/pandas-dev/pandas/issues/14467 — Mad Physicist, Oct 21 '16 at 18:36

piRSquared · Answer 1 · 2016-10-21T17:41:26.547

It does appear to be inconsistent. I'd use the pd.MultiIndex constructor from_arrays

idx = pd.MultiIndex.from_arrays([['transition [ns]'] * 5,
                                 [0.0005, 0.001, 0.01, 0.1, 0.5]])
col = pd.MultiIndex.from_arrays([[0.01, 0.02, 0.05, 0.1, 0.5],
                                 ['capacitance [pF]'] * 5])

cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753], 
                             [0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
                             [0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
                             [-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
                             [-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]], 
                            index=idx,
                            columns=col)
cell_rise_Inv

My bad. I didn't read OP's output fully. Downvote removed. Slow day, didn't intend to take it out on you. — Mad Physicist, Oct 21 '16 at 17:50
@MadPhysicist Kudos for leaving a comment on a down vote and for correcting a mistake. I recently and mistakenly closed someones post because I was tired... it happens. — piRSquared, Oct 21 '16 at 17:51

Mad Physicist · Accepted Answer · 2016-10-21T18:36:32.513

There is one major difference between your code and the example: the example passes a numpy array as the input rather than a nested list. In fact, adding np.array(...) around your list works just fine:

cell_rise_Inv= pd.DataFrame(
        np.array([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753], 
                  [0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
                  [0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
                  [-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
                  [-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]]), 
        index=[['transition [ns]'] * 5,
               [0.0005, 0.001, 0.01, 0.1, 0.5]],
        columns=[['capacitance [pF]'] * 5,
                 [0.01, 0.02, 0.05, 0.1, 0.5]])

I shortened the repeated strings in the index and swapped the order of the index levels, but those are not significant changes.

EDIT Did a little investigating. If you pass in a nested list (without the np.array call), the call will work without columns and even if columns is a 1D list. For some reason the nested list of two elements is not being interpreted as a multiindex unless the input is an ndarray.

I filed issue #14467 with pandas based on this question.

Python pandas Multindex column

2 Answers2

Linked