Pandas Group by a Column and take the non null values in different rows for each column

Question

I have an interesting question for which I am not getting any built-in solution in Pandas.

I have data like:

A B    C    D    E     F
1 null null Red  null  12
1 89   76   Red  null  null
1 null null null null  12

I am trying to get an output like:

1 89 76 Red null 12

Now, these data are messages coming at different times. Many messages belong to a single transaction.

So, suppose A is the identifier of the transaction. Some messages with the same identifier have a value for column B, some have value for column C.

So, I want to group by the unique identifier and then get the non-null values for each column.

So, this is a grouping and then aggregating task. But unlike count or things like that, I am really creating a new row.

I am not finding a way to do so in pandas. Any help is greatly appreciated, Thanks!

Example:>

    A   B        C       D
    1   null    null    100
    1   90      null    null
    2   null    1000    999
    1            80 
    2   1000    null    799

 df.groupby('A').first()
      B     C    D
A
1  null  null  100
2  null  1000  999

But the required output is:

     B     C    D
A
1  90     80  100
2  1000  1000  999

Something like `df.groupby('A').first()`? This might be a duplicate. — cs95, Jun 05 '18 at 06:02
@coldspeed i added the sample output : just the non null values from each row with the same id — david nadal, Jun 05 '18 at 06:37
@jezrael its not a duplicate... how can it be the same? the questions ask two different things..i ahve updated the question with a sample output for clairty — david nadal, Jun 05 '18 at 06:38
@davidnadal - Do you think `df.replace('null', np.nan).groupby('A').first()` ? — jezrael, Jun 05 '18 at 06:40
@jezrael no..i dont want to replace null by np nan..its more like a merge between the rows with the same id after the groupby ..i want to extract all the non null values for each column after grouping them with the same id ... and then construct a single row for each group.. — david nadal, Jun 05 '18 at 06:43
OK, is possible more `non-nans` values like in last column `12, null, 15` ? — jezrael, Jun 05 '18 at 06:45
@jezrael, its possible..then i just want to take the first one I guess..although I am sure that scenario will be quite rare in the data I have....anyways, i was worried about that condition and selecting the first value there will work.. — david nadal, Jun 05 '18 at 06:47
@davidnadal - But `df.groupby('A').first()` does not work? Why? I test it and working nice. — jezrael, Jun 05 '18 at 06:47
@jezrael http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.core.groupby.GroupBy.first.html Groupby first just takes the first row of "all the group rows" with the same id, right? It doesn't take first non-null values for each column.. Like here https://stackoverflow.com/questions/30486417/pandas-how-do-i-select-first-row-in-each-group-by-group/30492281?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa — david nadal, Jun 05 '18 at 06:49
@davidnadal - hmmm, but it working as you need - it return first non nan. Try test it in small dataframe — jezrael, Jun 05 '18 at 06:50
@jezrael it works on this example!! could you explain as answer maybe, why is this happening, what does np.nan replacmenet have to do with this? — david nadal, Jun 05 '18 at 08:30
Nobody answered @davidnadal. Hi David, if you are still wondering how to solve your problem: here is the answer (i've tested it): `df = df.groupby('A').apply(lambda x: x.bfill().ffill())` `df.groupby('A').last()` — Hernán Borré, Apr 24 '19 at 08:10

Pandas Group by a Column and take the non null values in different rows for each column

0 Answers0