0

So I have a dataframe

like this

with lists of stations 'STA' (of arbitrary length) and elevations 'Z' for each River Station 'RS'. It's index is ['River','Reach','RS','Weir STA'].

What I need to do is for each 'RS', add an additional row for each item in the 'STA' list corresponding to that 'RS'. This row needs to have a 'Weir STA' of that 'STA' item's value, and a 'Weir Elev' of 0 or NaN.

I've been trying different combinations of nested for i,df in DF.groupby(level=0) and different combinations of append, stack, melt, etc but can't figure it out. Now how would I go about this operation if instead I wanted to use the values of 'Z' for the 'Weir Elev'?

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74

1 Answers1

0

Ok, @MaxU had a function that does the trick!

Here dataframe "weirr" is the above DF without columns ['STA','Z'], and dataframe "se" is the columns ['STA','Z'] with their proper ['River','Reach','RS'] indices.

def explode(df, lst_cols, fill_value='', preserve_index=False):
    #https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows/40449726#40449726
    # make sure `lst_cols` is list-alike
    if (lst_cols is not None
        and len(lst_cols) > 0
        and not isinstance(lst_cols, (list, tuple, np.ndarray, pd.Series))):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)
    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()
    # preserve original index values    
    idx = np.repeat(df.index.values, lens)
    # create "exploded" DF
    res = (pd.DataFrame({
                col:np.repeat(df[col].values, lens)
                for col in idx_cols},
                index=idx)
             .assign(**{col:np.concatenate(df.loc[lens>0, col].values)
                            for col in lst_cols}))
    # append those rows that have empty lists
    if (lens == 0).any():
        # at least one list in cells is empty
        res = (res.append(df.loc[lens==0, idx_cols], sort=False)
                  .fillna(fill_value))
    # revert the original index order
    res = res.sort_index()
    # reset index if requested
    if not preserve_index:        
        res = res.reset_index(drop=True)
    return res

se=se.reset_index()
se = explode(se,['STA','Z']).set_index(['River','Reach','RS'])
se = se.rename(columns={'STA':'Weir STA'})
weirr = weirr.set_index('Weir STA', append=True)
weirr = pd.concat([weirr,se],sort=True)