0

I have a list of city names and a df with city, state, and zipcode columns. Some of the zipcodes are missing. When a zipcode is missing, I want to use a generic zipcode based on the city. For example, the city is San Jose so the zipcode should be a generic 'SJ_zipcode'.

pattern_city = '|'.join(cities) #works

foundit = ( (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ) ) #works--is this foundit a df?

df['zip_cd'] = foundit.replace( 'SJ_zipcode' ) #nope, error

Error: "Invalid dtype for pad_1d [bool]"

Implemented with where

df['zip_cd'].where( (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ), "SJ_Zipcode", inplace = True) #nope, empty set; all set to nan?

Implemented with loc

df['zip_cd'].loc[ (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ) ] = "SJ_Zipcode"

Some possible solutions that did not work

An additional 'want'; I want to update a dataframe with values, I do not want to create a new dataframe.

forest.peterson
  • 755
  • 2
  • 13
  • 30

1 Answers1

0

Try this:

df = pd.DataFrame(data)
df

    city         state        zip
0   Burbank      California   44325
1   Anaheim      California   nan
2   El Cerrito   California   57643
3   Los Angeles  California   56734
4   san Fancisco California   32819

def generate_placeholder_zip(row):
    if pd.isnull(row['zip'] ):
        row['zip'] =row['city']+'_ZIPCODE'
    return row   

df.apply(generate_placeholder_zip, axis =1)

    city          state         zip
0   Burbank       California    44325
1   Anaheim       California    Anaheim_ZIPCODE
2   El Cerrito    California    57643
3   Los Angeles   California    56734
4   san Fancisco  California    32819