2

I am trying to make new columns in a dataframe based on how many times a new value gets paired with another column.

original dataframe:

Name Primary Address Alternative Address
John Doe 123 Fudge Rd. UHP.INC
Lee Smith Pro Sports,LLC
Hank Hill Pharm Tank.co PodRacing.Cool
Hank Hill GhoulSchool,343
Hank Hill MoneyTree Rd.

Dataframe Im trying to achieve where if there is multiple alternative addresses to one name they split out to as many columns as needed:

Name Primary Address Alternative Address Alternative Address_2 Alternative Address_3
John Doe 123 Fudge Rd. UHP.INC
Lee Smith Pro Sports,LLC
Hank Hill Pharm Tank.co PodRacing.Cool GhoulSchool,343 MoneyTree Rd.
  • 1
    Please provide enough code so others can better understand or reproduce the problem. – Community Aug 17 '22 at 21:34
  • Maybe it'll be more convenient for you to have such columns: "Primary address" and "Alternative Address**es**" which contains all the alternative addresses in a **list** for each person? – Vladimir Fokow Aug 17 '22 at 21:58

1 Answers1

1

First, create a column that contains a list of all "alternative addresses" for each person:

f = lambda g: pd.Series([g['Primary'].dropna().iloc[0],
                         list(g['Alternative'].dropna())], 
                        index=['Primary', 'Alternative'])

new_df = df.groupby('Name').apply(f).reset_index()

If you need, you can then split this column into new columns:

alt_addresses = (pd.DataFrame(new_df['Alternative'].tolist())
                 .add_prefix('Alternative_Address_'))
result = pd.concat([new_df.drop(columns='Alternative'), alt_addresses], axis=1)

Results:

df = pd.DataFrame(
    {'Name': ['John Doe', 'Lee Smith', 'Hank Hill', 'Hank Hill', 'Hank Hill'], 
     'Primary': ['123 Fudge Rd.', 'Pro Sports,LLC', 'Pharm Tank.co', np.nan, np.nan], 
     'Alternative': ['UHP.INC', None, 'PodRacing.Cool', 'GhoulSchool,343', 'MoneyTree Rd.']})
print(new_df)

        Name         Primary                                       Alternative
0  Hank Hill   Pharm Tank.co  [PodRacing.Cool, GhoulSchool,343, MoneyTree Rd.]
1   John Doe   123 Fudge Rd.                                         [UHP.INC]
2  Lee Smith  Pro Sports,LLC                                                []
print(result)

        Name         Primary  Alternative_Address_0  Alternative_Address_1  Alternative_Address_2
0  Hank Hill   Pharm Tank.co         PodRacing.Cool        GhoulSchool,343          MoneyTree Rd.
1   John Doe   123 Fudge Rd.                UHP.INC                   None                   None   
2  Lee Smith  Pro Sports,LLC                   None                   None                   None   
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27