13

I have a dataframe with two rows and I'd like to merge the two rows to one row. The df Looks as follows:

              PC           Rating CY   Rating PY    HT
0             DE101           NaN            AA     GV
0             DE101           AA+           NaN     GV

I have tried to create two seperate dataframes and Combine them with df.merge(df2) without success. The result should be the following

              PC           Rating CY   Rating PY    HT
0             DE101           AA+            AA     GV

Any ideas? Thanks in advance Could df.update be a possible solution?

EDIT:

df.head(1).combine_first(df.tail(1))

This works for the example above. However, for columns containing numerical values, this approach doesn't yield the desired output, e.g. for

              PC           Rating CY   Rating PY    HT    MV1   MV2
0             DE101           NaN            AA     GV    0     20 
0             DE101           AA+           NaN     GV    10    0

The output should be:

              PC           Rating CY   Rating PY    HT   MV1    MV2
0             DE101           AA+            AA     GV   10     20

The formula above doesn't sum up the values in the last two columns, but takes the values in the first row of the dataframe.

              PC           Rating CY   Rating PY    HT   MV1    MV2
0             DE101           AA+            AA     GV   0     20

How could this problem be fixed?

Carmen
  • 763
  • 2
  • 10
  • 22

2 Answers2

7

You could use max with transpose like

In [2103]: df.max().to_frame().T
Out[2103]:
      PC Rating CY Rating PY  HT MV1 MV2
0  DE101       AA+        AA  GV  10  20
Zero
  • 74,117
  • 18
  • 147
  • 154
4

You can make use of DF.combine_first() method after separating the DF into 2 parts where the null values in the first half would be replaced with the finite values in the other half while keeping it's other finite values untouched:

df.head(1).combine_first(df.tail(1))
# Practically this is same as → df.head(1).fillna(df.tail(1))

enter image description here


Incase there are columns of mixed datatype, partitioning them into it's constituent dtype columns and then performing various operations on it would be feasible by chaining them across.

obj_df = df.select_dtypes(include=[np.object])
num_df = df.select_dtypes(exclude=[np.object])

obj_df.head(1).combine_first(obj_df.tail(1)).join(num_df.head(1).add(num_df.tail(1)))

enter image description here

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Thanks, this works for the example above. However, if there are additional columns with numerical values, the rows can't be combined properly, see edit of my post – Carmen Jan 17 '17 at 12:24
  • I've updated my post. Since you hadn't mentioned the presence of numerical values in your dataframe, I assumed you wanted to just update the top row with the bottom one in case of any nulls found. – Nickil Maveli Jan 17 '17 at 12:46
  • 1
    Thanks a lot! Another way would be replacing the 0 values in the last two columns by NaN and then apply the first forumla you suggested – Carmen Jan 17 '17 at 13:13