1

I was given data formatted in a weird fashion

df = pd.DataFrame([[1, 2, None, None], [1, None, 4, None], [1, None, None, 9, None], [1, None, None, None, 4]])
df.columns = ['name', 'c1', 'c2', 'c3', 'c4']

  name  c1  c2  c3  c4
    1   2.0 NaN NaN NaN
    1   NaN 4.0 NaN NaN
    1   NaN NaN 9.0 NaN
    1   NaN NaN NaN 4.0
    2   1.0 NaN NaN NaN
    2   NaN 4.0 NaN NaN

Given the key "name", I want to basically fill the NaN values in the first row of a key with the first other non-NaN value and condense it to one row, like this.

  name  c1  c2  c3  c4
    1   2.0 4.0 9.0 4.0
    2   1.0 4.0 NaN NaN

What's the best function to accomplish this? A groupby with first() to grab the first non-NA value?

  • 1
    Use [`justify` from this answer](https://stackoverflow.com/a/44559180/3483203). `justify(df.filter(like='c').values, invalid_val=np.nan, axis=0)`. Then drop null values. If you need to do it per group that answer won't work however. – user3483203 Aug 08 '19 at 20:46
  • 1
    In this case `df.groupby('name', as_index=False).max()`, would work, but thats probably just in the case for your example. The answer with justify linked here above is better in this case. – Erfan Aug 08 '19 at 20:49

1 Answers1

0

How about this?

df = pd.DataFrame([[1, 2, None, None], [1, None, 4, None], [1, None, None, 9, None], [1, None, None, None, 4],[2, 1, None, None, None],[2, None, 4, None, None]])
df.columns = ['name', 'c1', 'c2', 'c3', 'c4']
df.bfill(inplace=True)
newdf = df.groupby('name').head(1)
newdf

name    c1      c2      c3      c4
1       2.0     4.0     9.0     4.0
2       1.0     4.0     NaN     NaN
moys
  • 7,747
  • 2
  • 11
  • 42