1

I want to replace NaNs in a Pandas DataFrame column with non-NaN values from within the same group. In my case these are geo coordinates where for some reason some data points the lookup failed. e.g.:

df.groupby('place')

looks like

place| lat | lng
-----------------
foo  | NaN | NaN
foo  | 1   | 4
foo  | 1   | 4
foo  | NaN | NaN
bar  | 5   | 7
bar  | 5   | 7
bar  | NaN | NaN
bar  | NaN | NaN
bar  | 5   | 7

==> what I want:

foo  | 1   | 4
foo  | 1   | 4
foo  | 1   | 4
foo  | 1   | 4
bar  | 5   | 7
bar  | 5   | 7
bar  | 5   | 7
bar  | 5   | 7
bar  | 5   | 7

In my case the lat/lng values within the same 'place' grouping are constant, so picking any non-NaN value would work. I'm also curious how I could do a fill with e.g. mean/majority count.

Florian
  • 271
  • 3
  • 14
  • 1
    Possible duplicate of [Pandas: filling missing values by mean in each group](http://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group) – Jan Trienes Apr 05 '17 at 18:16

3 Answers3

2

Using groupby along with ffill and bfill

df[['lat', 'lng']]=df.groupby('place').ffill().bfill()

df:

    place   lat lng
0   foo 1   4
1   foo 1   4
2   foo 1   4
3   foo 1   4
4   bar 5   7
5   bar 5   7
6   bar 5   7
7   bar 5   7
8   bar 5   7    
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • thanks, `df[['lat', 'lng']]=df[['place','lat', 'lng']].groupby('place').ffill().bfill()` did the trick for me. – Florian Apr 06 '17 at 01:04
1

If you have the same values in a given group, the following should work:

df = df.fillna(method = 'ffill').fillna(method = 'bfill')
Peaceful
  • 4,920
  • 15
  • 54
  • 79
1

Fill up nan with first valid value in each group

df.fillna(df.groupby('place').transform('first'))

  place  lat  lng
0   foo  1.0  4.0
1   foo  1.0  4.0
2   foo  1.0  4.0
3   foo  1.0  4.0
4   bar  5.0  7.0
5   bar  5.0  7.0
6   bar  5.0  7.0
7   bar  5.0  7.0
8   bar  5.0  7.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624