1

Question: how can I replace specific row value within a pandas method chain.

Here is my code:

days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01,    -0.02, 0.01, 0.02,  -0.03, -0.05,0.10 ])
start = 100

df = pd.DataFrame({"time": days, "return":rets})

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
)
new_df.iloc[0] = new_df.iloc[0].replace(np.nan,1) # add to method chain above

How can I do the operation in the last line within the method chain. With method chain I mean

new_df = (df
          .assign(...) 
          .replace(...)
          )
FredMaster
  • 1,211
  • 1
  • 15
  • 35
  • You want to replace not all `na` values, but only in the first row. [`pd.replace`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) doesn't have an option to specify which row to fill, so I doubt that exactly what you ask for is possible. – Vladimir Fokow Aug 24 '22 at 18:07
  • I've tried `.fillna({0:0}, axis=1)` [by analogy to this answer](https://stackoverflow.com/a/47315262/14627505), but sadly: `NotImplementedError: Currently only can fill with dict/Series column by column`. – Vladimir Fokow Aug 24 '22 at 18:13
  • yes, pd.replace does not have this option. In my code `replace` was just a place holder. Any method would be fine. – FredMaster Aug 24 '22 at 18:22

2 Answers2

1

You actually can use .replace for this (with arbitrary values, not necessarily na-s):

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .T.replace({0: np.nan}, 1).T
)

It doesn't have an option to specify which row to fill, but does - for the column! So we can simply transpose our dataframe before and after the operation.


.fillna can work in a similar way (but it only replaces the na values):

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .T.fillna({0:1}).T
)

I had to transpose the dataframe before and after filling because currently it "can only fill with dict/Series column by column".

Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
  • Thanks, this works. Very helpful. I find it strange that there is no specified method for this simple operation. – FredMaster Aug 24 '22 at 18:24
  • Actually your code needs to be slightly amended to be `.T.fillna({0:1}).T` – FredMaster Aug 24 '22 at 18:26
  • @FredMaster, yes. Also, found a way to do this with `.replace` ! – Vladimir Fokow Aug 24 '22 at 18:28
  • 1
    Thanks for your alternative solution. I have meanwhile encountered a third option using `pipe` and writing a separate function for this. This is more verbose but a bit more readable for my future self. I will post it later. Thanks for your help – FredMaster Aug 24 '22 at 18:37
1

Vladimir has posted correct and useful answers to my initial question.

I have meanwhile encountered a third option. This option is more verbose as it requires writing a separate function. It has the benefit to be more readable in the method chain itself. At least for me.

# New function doing the replacement
def replace_first_row(_df, to_replace=np.nan, value=1):
    cols = _df.columns
    _df.iloc[0] = _df.iloc[0].replace(to_replace=to_replace, value=value)
    return _df

days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01,    -0.02, 0.01, 0.02,  -0.03, -0.05,0.10 ])
start = 100

df = pd.DataFrame({"time": days, "return":rets})

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .pipe(replace_first_row, np.nan, 1)
          )

new_df
FredMaster
  • 1,211
  • 1
  • 15
  • 35