I have a dataset that has 2 copies of each record. Each record has an ID, and each copy has the same ID.
15 of 18 fields are identical in both copies of the records. But in 3 fields, the top row contains 2 items and 1 NAN; the bottom row contains 1 item (where top row had a NAN) and 2 NANs (where top row had items). Sometimes there are random NANs that don't follow this pattern.
I need to collapse each record into one so that I have a single record that contains all 3 non-NAN fields.
I have tried various versions of groupby
. But that omits the 3 fields I need, which are all string-based. And it doubles the values of certain numeric fields.
If all else fails, I'll turn the letter fields into number codes and df.groupby(['ID']).agg('sum')
But I figure there's probably a smarter way to do this.