I have a pandas dataframe that is the result of a query where 1 column creates duplicate rows. I need help identifying non-duplicate values for duplicates by name, then dynamically creating new columns with all values, then delete duplicates. Below Mike has duplicates in column "Code" and Mark in "Lang", so I'd like one row for each with new columns for the non-dupe values.
ID | Name | Code | Lang |
1 | Mike | 25 | SQL |
1 | Mike | 26 | SQL |
1 | Mike | 27 | SQL |
2 | Mark | 39 | NoSQL |
2 | Mark | 39 | SQL |
Loop through and identify which columns are not duplicates, copy non-duplicate value, write to new column in first near duplicate row, delete duplicates.
ID | Name | Code | Code2 | Code3 | Lang | Lang2 |
1 | Mike | 25 | 26 | 27 | SQL | . |
2 | Mark | 39 | . | . | NoSQL | SQL |
I'm able to get to just the duplicate rows by using the below, but have done a lot of research and am having trouble getting to my result. I'm exploring pivot and melt as an option but am stuck on the dynamic column part.
dup_rows = orig_df[orig_df.duplicated(['Name'])]