Pandas efficient Multiindex getting and setting

Question

Below is a snapshot of my data structure in pandas

I build the below structure in a for loop
I am using sortlevel to lexsort the dataframe

df.sortlevel(inplace=True)

1) I need to get an efficient way to get and set specific rows as shown below. This is the formula i am using and it is not efficient.
a) Will i be able to set the values of the rows using assignment

df.loc[idx['AAA', slice(None),'LLL']].iloc[:,0:n]

df.loc[idx['AAA', slice(None),'LLL']].iloc[:,0:n] = another_df

2) How to Efficiently sum the columns for a result below

df.loc[idx['AAA', slice(None),'LLL']].iloc[:,0:n].sum(axis=1)

I am looking for an efficient way to slice the dataframe.

Thanks

Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, Aug 17 '17 at 20:12

score 0 · Accepted Answer · answered Sep 09 '17 at 23:07

Thanks for letting me know the right way to post questions for Pandas. Any case, below are my findings regarding this problem. Multindex is certainly powerful from the standpoint of organizing data and exporting to csv or excel. However, accessing and selecting data has been challenging to accomplish.

Best Practices for initializing multi index

I found it easy to pre allocate the index rather than create them on the fly. Creating index on the fly is not efficient and you will be faced with lexsort warning.
Sort the data frame index once the data frame is initialized.

When accessing do not leave the row or column identifier empty. Use :

for site_name in site_s:

no_of_progs =  len(site_s[site_name])
prog_name_in_sites = site_s[site_name].keys()
prog_level_cols = ['A','B', 'C']
prog_level_cols = ['A', 'C']

site_level_cols = ['A PLAN', 'A TOTAL','A UP','A DOWN','A AVAILABLE' ]

if counter == 0:
    pd_index_col = pd.MultiIndex.from_product([ [site_name], prog_name_in_sites,prog_level_cols],
                     names=['SITE', 'PROGRAM','TYPE'])
else: 
    pd_index_col = pd_index_col.append(pd.MultiIndex.from_product([ [site_name], prog_name_in_sites,prog_level_cols],
                     names=['SITE', 'PROGRAM','TYPE']))
if no_of_progs >1:
    pd_index_col = pd_index_col.append(pd.MultiIndex.from_product([ [site_name], ['LINES']  ,site_level_cols],
                     names=['SITE', 'PROGRAM','TYPE']))
counter = counter+1

df_A_site_level = pd.DataFrame(0,columns=arr_wk_num_wkly,index= pd_index_col, dtype=np.float64)

df_A_site_level.sort_index(inplace=True)

For setting and getting below are the two methods I recommend

df.iloc - if you know the positional index of rows and/or columns
df.loc - if you want to access data based on labels

Accessing using loc - Use the below to set or get cell/row values

idx = pd.IndexSlice
df_A_site_level[idx[site_name, :,'C'], df_A_site_level[0:no]]

Accessing using iloc - Use the below to set or get cell/row values

df_A_site_level.iloc[no_1:no_2,no3:no_4]

Pandas efficient Multiindex getting and setting

1 Answers1