0

I am new to Python and i can solve the following problem. I have a dataframe that looks like that:

ID      X1     x2     x3
1       15     NaN    NaN
2       NaN    2      NaN
3       NaN    NaN    5
1       NaN    16     NaN
2       1      NaN    NaN
3       6      NaN    NaN
4       NaN    NaN    75
5       NaN    67     NaN

I want to merge the rows by ID, as a result it should look like that:

ID    x1    x2   x3
1     15    16   NaN
2     1     2    NaN
3     6     NaN  5
4     NaN   NaN  75
5     NaN   67   NaN

I have tryed a lot with df.groupby("ID"), without success. Can someone fix that for me an supply the code for me. Thx

Tobi_93
  • 3
  • 4

2 Answers2

0

Try this:

df1 = df.groupby('ID',as_index=False,sort=False).last()
Dejene T.
  • 973
  • 8
  • 14
0

You can change your existing groupby like this. You can remove replace part if you would like 0.0 instead of NaN:

import numpy as np
df = df.fillna(0).astype(int).groupby('ID').sum().replace(0,np.nan)
print(df)

Output:

      
ID    X1    x2    x3
1   15.0  16.0   NaN
2    1.0   2.0   NaN
3    6.0   NaN   5.0
4    NaN   NaN  75.0
5    NaN  67.0   NaN

If you don't want ID as index, you can add reset_index:

import numpy as np
df = df.fillna(0).astype(int).groupby('ID').sum().replace(0,np.nan).reset_index()
print(df)

Output:

   ID    X1    x2    x3
0   1  15.0  16.0   NaN
1   2   1.0   2.0   NaN
2   3   6.0   NaN   5.0
3   4   NaN   NaN  75.0
4   5   NaN  67.0   NaN
Grayrigel
  • 3,474
  • 5
  • 14
  • 32
  • it works, thank you very much! But all the ID´s (not shown) with value NaN in x1 will be deleted, that should not be happening. Do you also have a solution for that, @Grayrigel? – Tobi_93 Oct 22 '20 at 12:25
  • Can you share the real input data? – Grayrigel Oct 22 '20 at 12:33
  • @Grayrigel - use this - https://stackoverflow.com/questions/18429491/pandas-groupby-columns-with-nan-missing-values – jezrael Oct 22 '20 at 12:34
  • No sadly not, but i edited the question. Now it should be clearer. Sorry it was my mistake. – Tobi_93 Oct 22 '20 at 12:37
  • @jezrael Thanks for the help. Really appreciate it. I thought of using `sum` . However, OP wanted to keep `NaN` so, couldn't thing anything than replacing `0.0` with `NaN`. – Grayrigel Oct 22 '20 at 13:09
  • @Tobi_93 Please Check the updated answer. Let me know if it works. – Grayrigel Oct 22 '20 at 13:20
  • @Grayrigel, thanks for the update. No, x1 will be deleted or not shown then. ID x2 and x3 is still there. – Tobi_93 Oct 22 '20 at 13:54
  • @Tobi_93 Can you try now? Hopefully this should resolve the problem. – Grayrigel Oct 22 '20 at 16:40
  • The ID column is alwas filled with data, so there are no missing ones, just duplicates, @jezrael. Grayrigel, i have tried it, still the same issue. Sorry, thanks for you help but do not spend to much time on it. – Tobi_93 Oct 23 '20 at 06:56
  • @Grayrigel i found out that x1 is in the orginal data dtype = object, i think thats why the sum() is not working. – Tobi_93 Oct 23 '20 at 07:13
  • @Tobi_93 Updated my answer once more. This convert x1 dtype to int. It should work now. If it doesn't work can you check the initial `fillna` is working or not. – Grayrigel Oct 23 '20 at 07:22
  • Great, thats the solution! Thank you very much for your effort! Really appreciate it! – Tobi_93 Oct 23 '20 at 07:42