0

Similar to this question Pandas interpolate within a groupby but the answer to that question does the interpolate() for all columns. If I only want to limit the interpolate() to one column how do I do that?

Input

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     NaN
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

Expected Output

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     15
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

This attempt only returns val2 column but not the rest of the columns.

df = df.groupby('filename').apply(lambda group: group['val2'].interpolate(method='index'))
dch2404
  • 5
  • 2
  • If it only returns `val2` column... as expected... then only push the results to the `val2` column... `df['val2'] = df.groupby(...` – BeRT2me Nov 23 '22 at 17:16

1 Answers1

0

A direct approach:

df = pd.read_clipboard() # clipboard contains OP sample data
# interpolate only on col "val2"
df["val2_interpolated"] = df[["filename","val2"]].groupby('filename')
.apply(lambda x:x) # WTF
.interpolate(method='linear')["val2"]

returns:

    filename  val1  val2  val2_interpolated
t
1  file1.csv   5.0  10.0               10.0
2  file1.csv   NaN   NaN               15.0
3  file1.csv  15.0  20.0               20.0
6  file2.csv   NaN   NaN               20.0
7  file2.csv  10.0  20.0               20.0
8  file2.csv  12.0  15.0               15.0
LoneWanderer
  • 3,058
  • 1
  • 23
  • 41
  • `groupby` has an `ignore_index` keyword, that'd be better than using `reset_index` after the fact~ – BeRT2me Nov 23 '22 at 17:17
  • perfect! exatcly what i was looking for, and yes, even as a newbie at pandas, that lambda x:x was a WTF moment! – dch2404 Nov 23 '22 at 17:44