Find all matching keys and values in data frame

Question

Given a Pandas DF like the one below were the key is "state county" I want to find all the zip codes that are associated with each key.

On a small scale I accomplished this using a loop but it is not efficient for the 50,000 keys I need to check. Is there a programing concept I can use to solve this without having to check each line against all the other lines?

Original df

state county	zip
VA Culpeper County	20106
VA Fauquier County	20115
VA Loudoun County	20101
VA Loudoun County	20102
VA Loudoun County	20103
VA Loudoun County	20104
VA Loudoun County	20105
VA Loudoun County	20107
VA Manassas City	20108
VA Manassas city	20110
VA Manassas City	20113
VA Prince William County	20109
VA Prince William County	20111
VA Prince William County	20112

Output df

state county	zip 1	zip 2	zip 3	zip 4	zip 5	zip 6
VA Culpeper County	20106
VA Fauquier County	20115
VA Loudoun County	20101	20102	20103	20104	20105	20107
VA Manassas City	20108	20110	20113
VA Prince William County	20109	20111	20112

Are you using pandas or what structure is a DF? If so you should specify it is a pandas dataframe and tag the question accordingly. — Drubio, Feb 08 '21 at 19:03

PieCot · Accepted Answer · 2021-02-08T19:31:03.253

You can collect all the zips for every state and then create a new DataFrame

df2 = df.groupby('state county').agg({'zip': list})
df3 = pd.DataFrame(df2.zip.tolist(), index=df2.index)

You can rename the columns of the resulting DataFrame, so that they are in the form zip <n>:

df3.columns = list(map("zip {}".format, df3.columns + 1))

df3
                          zip 1  zip 2  zip 3  zip 4  zip 5  zip 6
state county                                                      
VA Culpeper County        20106   None   None   None   None   None
VA Fauquier County        20115   None   None   None   None   None
VA Loudoun County         20101  20102  20103  20104  20105  20107
VA Manassas City          20108  20110  20113   None   None   None
VA Prince William County  20109  20111  20112   None   None   None

Find all matching keys and values in data frame

1 Answers1