1

After merging of two data frames:

output = pd.merge(df1, df2, on='ID', how='outer')

I have data frame like this:

index  x    y   z
  0    2   NaN  3
  0   NaN   3   3
  1    2   NaN  4
  1   NaN   3   4
...

How to merge rows with the same index? Expected output:

index  x   y  z
  0    2   3  3
  1    2   3  4
Zero
  • 74,117
  • 18
  • 147
  • 154
bartblons
  • 125
  • 2
  • 7
  • What happens if the values in `z` differ? Does that ever happen? – IanS Jul 18 '17 at 09:52
  • You're merging on 'ID', but it isn't anywhere in your dataframe. Feels like we're missing some data. – elPastor Jul 18 '17 at 09:53
  • This is an example, in the code I have this column – bartblons Jul 18 '17 at 09:54
  • @IanS When vaules in 'z' are different, it is still happen – bartblons Jul 18 '17 at 09:56
  • 2
    This is a very simple question (in terms of reproducibility), if you want the best answers, always post a reproducible question. Include original dataframes as well. – elPastor Jul 18 '17 at 10:03
  • I entirely agree with @pshep. You can have much better and more precise answers if you provide a reproducible input data sets and desired data set. [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Jul 18 '17 at 10:20

2 Answers2

3

Perhaps, you could take mean on them.

In [418]: output.groupby('index', as_index=False).mean()
Out[418]:
   index    x    y  z
0      0  2.0  3.0  3
1      1  2.0  3.0  4
Zero
  • 74,117
  • 18
  • 147
  • 154
2

We can group the DataFrame by the 'index' and then... we can just get the first values with .first() or minimum with .min() etc. depending on the case of course. What do you want to get if the values in z differ?

In [28]: gr = df.groupby('index', as_index=False)

In [29]: gr.first()
Out[29]:
   index    x    y  z
0      0  2.0  3.0  3
1      1  2.0  3.0  4

In [30]: gr.max()
Out[30]:
   index    x    y  z
0      0  2.0  3.0  3
1      1  2.0  3.0  4

In [31]: gr.min()
Out[31]:
   index    x    y  z
0      0  2.0  3.0  3
1      1  2.0  3.0  4

In [32]: gr.mean()
Out[32]:
   index    x    y  z
0      0  2.0  3.0  3
1      1  2.0  3.0  4
Sevanteri
  • 3,749
  • 1
  • 23
  • 27