1

Hello I am dealing with a dataframe like below:

   yearStart    2014  2015  2016  2017  2018  2019  
0    2015        0    150   200     0     0     0       
1    2016        0     0    200   140    35    10       
2    2017        0     0     0     20    12    12

Typically, it is an financial report with all charges, begining when contract is signed (column 'year Start') and lasting during several years

   yearStart Year+0  Year+1  Year+2  Year+3  Year+4 ... Year+N
0     2015    150     200      0      0       0       
1     2016    200     140     35      0       0       
2     2017    20      12      12      0       0      

How to reshape the dataframe in order to have datas stored in a relative-date style from the first year of the contract .

I tried by iterrows() on each line and copy the relevant columns in another dataframe but it takes too much time...

EDIT:

Well I forgot to say that maybe one year in the relevant period of the contract, value is 0 and it shouldn't be forgetten. the columns to consider are between the date in yearStart and a end, given as a parameter. input is more like this:

   0    2015        0    150   200     0    13     0       
   1    2016        0     0    200   140    35     0    10   
   2    2017        0     0     0     20    12     0    12

Thank you

phil
  • 57
  • 5
  • I change answer by your changed question. – jezrael Sep 19 '17 at 14:42
  • Regarding my updated question, it works perfectly (128 s to reshape a 200 000 *4 Dataframe) I used the mask syntax with series to make it (thanks to Zero and Jarad). A funny fact: when I made a 'print()' to examinate each row in apply method, I noticed that the first row is displayed two times , even if there are no consequences on the final result – phil Sep 20 '17 at 08:26

2 Answers2

1

Create new rows with apply with filtering, then assign new columns names

df1 = df.apply(lambda x: pd.Series(x[x!=0].values), 1).fillna(0).astype(int)
df1.columns = df.columns.tolist()[:len(df1.columns)]
df1 = df1.reindex(columns=df.columns, fill_value=0)
print (df1)
   yearStart  2014  2015  2016  2017  2018  2019
0       2015   150   200     0     0     0     0
1       2016   200   140    35    10     0     0
2       2017    20    12    12     0     0     0

If larger DataFrame is possible use Divakar function justify_rows:

def justify_rows(a, side='left'):
    mask = a>0
    justified_mask = np.sort(mask,1)
    if side=='left':
        justified_mask = justified_mask[:,::-1]
    out = np.zeros_like(a) 
    out[justified_mask] = a[mask]
    return out

df1 = pd.DataFrame(justify_rows(df.values), columns=df.columns, index=df.index)
print (df1)
   yearStart  2014  2015  2016  2017  2018  2019
0       2015   150   200     0     0     0     0
1       2016   200   140    35    10     0     0
2       2017    20    12    12     0     0     0

If want strings Years:

cols = ['yearStart'] + ['Year+{}'.format(x) for x in range(len(df.columns) - 1)]
df1 = pd.DataFrame(justify_rows(df.values), columns=cols, index=df.index)
print (df1)
   yearStart  Year+0  Year+1  Year+2  Year+3  Year+4  Year+5
0       2015     150     200       0       0       0       0
1       2016     200     140      35      10       0       0
2       2017      20      12      12       0       0       0

EDIT:

For second solution need this solution for select first consecutive 0:

def justify_rows(a, side='left'):
    mask = a.cumsum(axis=1) != 0
    print (mask)
    justified_mask = np.sort(mask,1)
    print (justified_mask)
    if side=='left':
        justified_mask = justified_mask[:,::-1]
    out = np.zeros_like(a) 
    out[justified_mask] = a[mask]
    print (out)
    return out

cols = ['Year+{}'.format(x) for x in range(len(df.columns) - 1)]
df1 = df[['yearStart']].join(pd.DataFrame(justify_rows(df.values[:, 1:]),
                                          columns=cols, index=df.index))
print (df1)
   yearStart  Year+0  Year+1  Year+2  Year+3  Year+4  Year+5
0       2015     150     200       0      13       0       0
1       2016     200     140      35       0       0       0
2       2017      20      12       0       0       0       0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1
df=df.replace({0:np.nan})
df=df.loc[:,df.isnull().sum(0).ne(3)]

Option 1 :

df.apply(lambda x : (x[x.notnull()].values.tolist()+x[x.isnull()].values.tolist()),1).fillna(0)

Out[145]:

   yearStart   2015   2016  2017  2018  2019
0     2015.0  150.0  200.0   0.0   0.0   0.0
1     2016.0  200.0  140.0  35.0  10.0   0.0
2     2017.0   20.0   12.0  12.0   0.0   0.0

Option 2 :

df.apply(lambda x: sorted(x, key=pd.isnull), 1).fillna(0)


Out[145]: 
   yearStart   2015   2016  2017  2018  2019
0     2015.0  150.0  200.0   0.0   0.0   0.0
1     2016.0  200.0  140.0  35.0  10.0   0.0
2     2017.0   20.0   12.0  12.0   0.0   0.0
BENY
  • 317,841
  • 20
  • 164
  • 234