Having created the structure of the multiindex columns, I'm unable to fit this template to the dataframe; how to do this?

Question

I have the right multi-index structure for the dataset, however I'm unable to fit this template to the dataset.

Importing the dataset:

data = pd.read_excel('IRCC_M_TRStudy_0001_E.xls')

code for multiindex columns:

years = (2015,2016,2017,2018,2019)
months = [
    ("Jan", "Feb", "Mar"),
    ("Apr", "May", "Jun"),
    ("Jul", "Aug", "Sep"),
    ("Oct", "Nov", "Dec"),
]
tuples = [(year, f"Q{i + 1}", month) for year in years for i in range(4) for month in months[i]]
multi_index = pd.MultiIndex.from_tuples(tuples)

My attempt to fit this template to the dataset:

df = pd.DataFrame(data, index = data['Country of Citizenship'], columns = multi_index)

the result:

consists of an index of 'countries of citizenship' and multi-index columns consisting of 3 levels - years(2015 - 2019), 4 Quarters for each year and 3 months per quarter (as expected). However, all the data is missing - all columns and rows show 'nan' values.

The expected results should look like this:

                                     2015
                  Q1                 Q2             Q3            Q4
           Jan  Feb  Mar      Apr  May Jun     Jul Aug Sep     Oct Nov Dec   
Country
USA        34   33   23      12   34   23      23  12  34       56   67   57
India      33   12   29      16   35   27      25  15  33       57   63   51

The above table repeats for years 2016,2017,2018,2019 from left to right. The data above is only for the purpose of representation, I want to fit the multi-index template to the dataset that consists of similar data. Also, how can I position the index 'country' a row below the row containing months as shown in the expected results?

I cannot tell why your data isn't fitting well without some sample data from the excel sheet. Your code seems fine otherwise. To get the country where you want it in the index, try this: index = pd.Series(data['Country of Citizenship'], name='Country') — run-out, Apr 11 '19 at 04:49
It's my first time dealing with multi-indexed data; I'm not sure how to import such datasets into pandas - could you please tell me how you would deal with a dataset like that? Maybe I'm just approaching it all wrong...Thank you so much! — irahul, Apr 11 '19 at 15:04
No problem. I need to have a peak at your data. Can you call 'data.head()' and post the first few columns of data as an edit to your post above? Once I see the structure of your data it will be easier to guide you. Thanks. — run-out, Apr 11 '19 at 21:09
That's kind of the whole problem. My dataset got distorted when I brought it into pandas. The actual dataset looks exactly like the 'expected results' above. I was trying to get the same exact structure of the data in pandas to work with it. I'm not sure if thats how we should deal with multi-index datasets though...really interested in what you think about it! Thanks! — irahul, Apr 12 '19 at 15:06
Just missed out on an important detail - the columns repeat for years 2015 - 2019 from left to right. I made up the data inside it because I couldn't copy paste the columns from the actual dataset.. — irahul, Apr 12 '19 at 15:08
This might answer your question. https://stackoverflow.com/questions/35300099/reading-excel-sheet-as-multiindex-dataframe-through-pd-read-excel — run-out, Apr 13 '19 at 04:25

Having created the structure of the multiindex columns, I'm unable to fit this template to the dataframe; how to do this?

0 Answers0