[Modified based on the admirable answer from PSidom]
Using groupby()
is the key point indeed, but it might be confusing not to mention what bfill()
does as it's not doing what you actually think it does.
Let's take a quick glance at the doc here.
Instead of back filling the data like what the OP wants, it actually just fill in the missing data with non-missing data in the next column.
It works great with groupby()
in this case, while you also need to do groupby('*your group*').ffill()
for forward filling in case that the data you have are more complicated.
For further illustration, let's modify your data like this:
import numpy as np
import pandas as pd
raw_data = {'surface': [np.nan, np.nan, 'round', 'square', np.nan, np.nan, np.nan, np.nan],
'city': ['San Francisco', 'Miami', 'San Francisco', 'Miami', 'Miami', 'Miami', 'San Francisco', 'Miami']}
df = pd.DataFrame(raw_data, columns = ['surface', 'city'])
df
# surface city
#0 NaN San Francisco
#1 NaN Miami
#2 round San Francisco
#3 square Miami
#4 NaN Miami
#5 NaN Miami
#6 NaN San Francisco
#7 NaN Miami
With only df.groupby('city').bfill()
, you'll got:
df2 = df.groupby('city').bfill()
df2
# surface city
#0 round San Francisco
#1 square Miami
#2 round San Francisco
#3 square Miami
#4 NaN Miami
#5 NaN Miami
#6 NaN San Francisco
#7 NaN Miami
See what is going on there? bfill()
did the job in row 0 and 1, but remain row 4 ~ 7 unchanged. You should use both bfill()
and ffill()
instead. Maybe something like this:
df3 = df2.groupby('city').ffill()
df3
# surface city
#0 round San Francisco
#1 square Miami
#2 round San Francisco
#3 square Miami
#4 square Miami
#5 square Miami
#6 round San Francisco
#7 square Miami
To be noticed, you shouldn't use something like df.groupby('city').bfill().ffill()
. It'll fill in something wrong there.