How to merge multiple columns containing numeric data in Pandas, but ignore empty cells

Question

I have a table like this:

|-----|-----|-----|
|  A  |  B  |  C  |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|  1  |     |     |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|

where each column in the desired range has only one integer in its row. I want to merge these columns into a single new column that would look like this:

|-----|-----|-----|    |-----|
|  A  |  B  |  C  |    |  Z  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|  1  |     |     | →  |  1  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|

I have been searching, but the closest solution I can find is doing something like:

df.iloc[:,some_column:another_column].apply( lambda x: "".join(x.astype(str)), axis=1)

However, this also concatenates "NaN"s from the blank cells, which is obviously undesirable.

How might I get my desired output?

@cs95 that's perfect! Can you please add this as an answer so I may accept it? — Raj, Nov 03 '20 at 05:07
No worries, your question has been asked before, you can upvote any of the solutions [here](https://stackoverflow.com/questions/34989341/how-to-remove-nan-value-while-combining-two-column-in-panda-data-frame) although they are not as useful as `max(axis=1)` or `sum` because you are working with numeric data which affords you some shortcuts. — cs95, Nov 03 '20 at 05:08
@cs95 I think `max` is safer than `sum` (in case there are two columns that contain a value, for some weird reason) which is why I would prefer to accept your suggestion as the answer. The link provided here https://stackoverflow.com/questions/34989341/how-to-remove-nan-value-while-combining-two-column-in-panda-data-frame does not have `max` or `sum` as an answer, and merges strings, which is why I believe my question is unique enough to justify its own post. Thoughts? — Raj, Nov 03 '20 at 05:10
Fair enough, I've reopened the question because there is a difference in the type of data. I've edited my solution into the answer below, you can accept that instead. If you still feel like supporting me and want to learn more about pandas, I have written about various topics [here](https://github.com/Coldsp33d/stackoverflow-pandas-canonicals). — cs95, Nov 03 '20 at 05:13

score 2 · Accepted Answer · edited Nov 03 '20 at 05:12

2

I think it is what you want.

import pandas as pd
df = pd.DataFrame({"A":[np.nan, 1, np.nan, np.nan, np.nan],
                   "B": [5, np.nan, 5, np.nan, np.nan]})
df['Z'] = df.sum(axis = 1)

Alternatively, you can use

df['Z'] = df.max(axis = 1)

Which might be safer if (per chance) you have multiple non-NULL values and just want one of them (the largest one in this case).

edited Nov 03 '20 at 05:12

cs95

379,657
97
704
746

answered Nov 03 '20 at 05:02

Gilseung Ahn

2,598
1
4
11

How to merge multiple columns containing numeric data in Pandas, but ignore empty cells

1 Answers1