1

I have a table like this:

|-----|-----|-----|
|  A  |  B  |  C  |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|  1  |     |     |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|

where each column in the desired range has only one integer in its row. I want to merge these columns into a single new column that would look like this:

|-----|-----|-----|    |-----|
|  A  |  B  |  C  |    |  Z  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|  1  |     |     | →  |  1  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|

I have been searching, but the closest solution I can find is doing something like:

df.iloc[:,some_column:another_column].apply( lambda x: "".join(x.astype(str)), axis=1)

However, this also concatenates "NaN"s from the blank cells, which is obviously undesirable.

How might I get my desired output?

cs95
  • 379,657
  • 97
  • 704
  • 746
Raj
  • 1,555
  • 18
  • 32
  • `df.max(axis=1)` – cs95 Nov 03 '20 at 05:02
  • @cs95 that's perfect! Can you please add this as an answer so I may accept it? – Raj Nov 03 '20 at 05:07
  • No worries, your question has been asked before, you can upvote any of the solutions [here](https://stackoverflow.com/questions/34989341/how-to-remove-nan-value-while-combining-two-column-in-panda-data-frame) although they are not as useful as `max(axis=1)` or `sum` because you are working with numeric data which affords you some shortcuts. – cs95 Nov 03 '20 at 05:08
  • 1
    @cs95 I think `max` is safer than `sum` (in case there are two columns that contain a value, for some weird reason) which is why I would prefer to accept your suggestion as the answer. The link provided here https://stackoverflow.com/questions/34989341/how-to-remove-nan-value-while-combining-two-column-in-panda-data-frame does not have `max` or `sum` as an answer, and merges strings, which is why I believe my question is unique enough to justify its own post. Thoughts? – Raj Nov 03 '20 at 05:10
  • Fair enough, I've reopened the question because there is a difference in the type of data. I've edited my solution into the answer below, you can accept that instead. If you still feel like supporting me and want to learn more about pandas, I have written about various topics [here](https://github.com/Coldsp33d/stackoverflow-pandas-canonicals). – cs95 Nov 03 '20 at 05:13

1 Answers1

2

I think it is what you want.

import pandas as pd
df = pd.DataFrame({"A":[np.nan, 1, np.nan, np.nan, np.nan],
                   "B": [5, np.nan, 5, np.nan, np.nan]})
df['Z'] = df.sum(axis = 1)

Alternatively, you can use

df['Z'] = df.max(axis = 1)

Which might be safer if (per chance) you have multiple non-NULL values and just want one of them (the largest one in this case).

cs95
  • 379,657
  • 97
  • 704
  • 746
Gilseung Ahn
  • 2,598
  • 1
  • 4
  • 11