1

I have this column in a pandas dataframe:

df = pd.DataFrame({'City': ['Barcelona', 'Barcelona', False, False, False, 'Barcelona', 'Paris', False, 'London', False, 'London']})

         City
0   Barcelona
1   Barcelona
2       False
3       False
4       False
5   Barcelona
6       Paris
7       False
8      London
9       False
10     London

And I want this result (note the False between Paris and London):

out = pd.DataFrame({'City': ['Barcelona', 'Barcelona', 'Barcelona', 'Barcelona', 'Barcelona', 'Barcelona', 'Paris', False, 'London', 'London', 'London']})

         City
0   Barcelona
1   Barcelona
2   Barcelona
3   Barcelona
4   Barcelona
5   Barcelona
6       Paris
7       False
8      London
9      London
10     London

False values between the two cells containing Barcelona should all of them change to Barcelona. However the one that is between Paris and London should stay as False because we don't know to which city it corresponds. Finally, for the final London missing it should also change and include this city.

I tried changing the False values considering only above and underneath cells and of course, only the last missing value changed to London.

However, I think there could be a solution when creating a temporary column that propagates the name of the last city mentioned before a sequence of consecutive values and only transfer the data to the valid column if the next valid city is the same as the one that is being propagated. If not, the propagation should be rejected and the False values should remain.

This is just an idea and I don't know if it is done this way, neither I know how to build it up. How can I solve my problem? Is there a direct way?

mab15
  • 13
  • 2

2 Answers2

3

You can replace the False by NaN, perform a double ffill/bfill and only keep the identical fills with boolean indexing:

s1 = df['City'].replace({False: np.nan})
s2 = s1.ffill()

df.loc[s2.eq(s1.bfill()), 'City'] = s2

Output:

         City
0   Barcelona
1   Barcelona
2   Barcelona
3   Barcelona
4   Barcelona
5   Barcelona
6       Paris
7       False
8      London
9      London
10     London

Intermediates:

         City      ffill      bfill  ffill==bfill
0   Barcelona  Barcelona  Barcelona          True
1   Barcelona  Barcelona  Barcelona          True
2   Barcelona  Barcelona  Barcelona          True
3   Barcelona  Barcelona  Barcelona          True
4   Barcelona  Barcelona  Barcelona          True
5   Barcelona  Barcelona  Barcelona          True
6       Paris      Paris      Paris          True
7       False      Paris     London         False
8      London     London     London          True
9      London     London     London          True
10     London     London     London          True
mozway
  • 194,879
  • 13
  • 39
  • 75
0
import pandas as pd
import numpy as np

df = pd.DataFrame({'City': ['Barcelona', 'Barcelona', False, False, False, 'Barcelona', 'Paris', False, 'London', False, 'London']})

# Define a special ffill mask by excluding 'Paris'
m = (df['City'].replace(False, np.nan).ffill()).ne('Paris')
# Update df values except Paris locations
df[m] = df[m].replace(False, np.nan).ffill()

print(df)
         City
0   Barcelona
1   Barcelona
2   Barcelona
3   Barcelona
4   Barcelona
5   Barcelona
6       Paris
7       False
8      London
9      London
10     London
Laurent B.
  • 1,653
  • 1
  • 7
  • 16