Thanks for letting me know the right way to post questions for Pandas. Any case, below are my findings regarding this problem. Multindex is certainly powerful from the standpoint of organizing data and exporting to csv or excel. However, accessing and selecting data has been challenging to accomplish.
Best Practices for initializing multi index
- I found it easy to pre allocate the index rather than create them on the fly. Creating index on the fly is not efficient and you will be faced with lexsort warning.
- Sort the data frame index once the data frame is initialized.
When accessing do not leave the row or column identifier empty. Use :
for site_name in site_s:
no_of_progs = len(site_s[site_name])
prog_name_in_sites = site_s[site_name].keys()
prog_level_cols = ['A','B', 'C']
prog_level_cols = ['A', 'C']
site_level_cols = ['A PLAN', 'A TOTAL','A UP','A DOWN','A AVAILABLE' ]
if counter == 0:
pd_index_col = pd.MultiIndex.from_product([ [site_name], prog_name_in_sites,prog_level_cols],
names=['SITE', 'PROGRAM','TYPE'])
else:
pd_index_col = pd_index_col.append(pd.MultiIndex.from_product([ [site_name], prog_name_in_sites,prog_level_cols],
names=['SITE', 'PROGRAM','TYPE']))
if no_of_progs >1:
pd_index_col = pd_index_col.append(pd.MultiIndex.from_product([ [site_name], ['LINES'] ,site_level_cols],
names=['SITE', 'PROGRAM','TYPE']))
counter = counter+1
df_A_site_level = pd.DataFrame(0,columns=arr_wk_num_wkly,index= pd_index_col, dtype=np.float64)
df_A_site_level.sort_index(inplace=True)
For setting and getting below are the two methods I recommend
df.iloc
- if you know the positional index of rows and/or columns
df.loc
- if you want to access data based on labels
Accessing using loc - Use the below to set or get cell/row values
idx = pd.IndexSlice
df_A_site_level[idx[site_name, :,'C'], df_A_site_level[0:no]]
Accessing using iloc - Use the below to set or get cell/row values
df_A_site_level.iloc[no_1:no_2,no3:no_4]