I have a dataframe that outputs multiple rows for some IDs based on how much info I have on each, and I want to flatten it where possible. An example of my dataframe is below.
id address city state country company title email
001 80 Fake St. San F. CA USA null null personal@gmail.com
001 null null null null Amazon CEO null
002 2 Test Ave. San F. CA USA null null test@aol.com
002 null null null null Google CEO googleceo@gmail.com
I would like for the rows for 001 to be combined into one, since all values can fit into the same row. For 002, it is ok for them to be on separate rows, since there needs to be two rows anyways to store both emails.
I've tried
df.groupby('id', squeeze=True)
which gets me close to what I want, but for 002 it duplicates all data from the first row and puts it on the second. I would prefer that those stay as null, so that the same data isn't being processed twice once I receive the final output. So my desired final output is:
id address city state country company title email
001 80 Fake St. San F. CA USA Amazon CEO personal@gmail.com
002 2 Test Ave. San F. CA USA null null test@aol.com
002 null null null null Google CEO googleceo@gmail.com
I never know which columns will be able to be merged/flattened, so I can't drop duplicates based on certain columns (I don't think). Any suggestions on how to achieve this output would be great!