I have an interesting question for which I am not getting any built-in solution in Pandas.
I have data like:
A B C D E F
1 null null Red null 12
1 89 76 Red null null
1 null null null null 12
I am trying to get an output like:
1 89 76 Red null 12
Now, these data are messages coming at different times. Many messages belong to a single transaction.
So, suppose A is the identifier of the transaction. Some messages with the same identifier have a value for column B, some have value for column C.
So, I want to group by the unique identifier and then get the non-null values for each column.
So, this is a grouping and then aggregating task. But unlike count or things like that, I am really creating a new row.
I am not finding a way to do so in pandas. Any help is greatly appreciated, Thanks!
Example:>
A B C D
1 null null 100
1 90 null null
2 null 1000 999
1 80
2 1000 null 799
df.groupby('A').first()
B C D
A
1 null null 100
2 null 1000 999
But the required output is:
B C D
A
1 90 80 100
2 1000 1000 999