2

I have the following df:

           sales2001   sales2002   sales2003  sales2004
   200012  19.12       0.98 
   200101  19.1        0.98        2.3
   200102  21          0.97        0.8
    ...
   200112  19.12       0.99        2.4
   200201              0.98        2.5
   200202              0.97        0.8        1.2

I would like to shift the content in order to align it a timegap view, as follow:

           sales+1y   sales+2y
   200012  19.12       0.98 
   200101  0.98        2.3       
   200102  0.97        0.8
    ...
   200112  0.99        2.4
   200201  0.98        2.5
   200202  0.8         1.2

basically aligning the forecasted data points to a fixed timegap to the index. I tried with iterrows and dynamically calling the columns given the index but cannot make it work. do you guys have any suggestion?

lorenzo
  • 397
  • 4
  • 16

2 Answers2

2

Use justify with DataFrame.dropna and axis=1 for remove all columns with at least one NaN:

df1 = (pd.DataFrame(justify(df.values, invalid_val=np.nan, side='right'), index=df.index)
          .dropna(axis=1))

If need select last columns by position:

df1 = pd.DataFrame(justify(df.values, invalid_val=np.nan, side='right')[:, -2:],index=df.index)

Or:

df1 = (pd.DataFrame(justify(df.values, invalid_val=np.nan, side='right'), index=df.index)
        .iloc[:, -2:])

df1.columns = [f'sales+{i+1}y' for i in range(len(df1.columns))]
print (df1)
        sales+1y  sales+2y
200012     19.12      0.98
200101      0.98      2.30
200102      0.97      0.80
200112      0.99      2.40
200201      0.98      2.50
200202      0.80      1.20
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    jezrael, looks very nice. let me try it with the data – lorenzo May 28 '19 at 11:19
  • hi Jez, I am having some problems here: first, np.nan masking fails cos of float, so I switched to pd.null(). second, I belive the right justification should be left? or am I misinterpreting the function? in any case, the third problem is that this returns me an empty df, with only index.. – lorenzo May 29 '19 at 02:28
  • the problem seems to be at if axis==1: out[justified_mask] = a[mask] while I see the a[mask] correctly maps true/false, it is not properly passed to out? and the output is full nan – lorenzo May 29 '19 at 03:09
  • I don't understand how sorting the mask could ever make move the below data in place. does someone have any explanation? – lorenzo May 29 '19 at 10:15
  • @lorenzo - please check [this](https://stackoverflow.com/a/52621908/2901002) changed justify for working for general data, not only numeric. – jezrael May 29 '19 at 10:17
1

Another option is to use pd.wide_to_long and pivot:

# here I assume the index name is index
new_df = pd.wide_to_long(df.reset_index(), 'sales', i='index', j='sale_end').reset_index()

# if index is datetime, then use dt.year
new_df['periods'] = new_df['sale_end'] - new_df['index']//100

# pivot
new_df.dropna().pivot(index='index',columns='periods', values='sales')

output:

periods -1      0       1       2
idx                 
200012  NaN     NaN     19.12   0.98
200101  NaN     19.10   0.98    2.30
200102  NaN     21.00   0.97    0.80
200112  NaN     19.12   0.99    2.40
200201  0.98    2.50    NaN     NaN
200202  0.97    0.80    1.20    NaN
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • ok, I manage to make it work! I had some issues with the index type but fixed. thank you very much, very helpful! – lorenzo May 30 '19 at 07:39