4

If I have a pandas dataframe like so:

a1      0.116667          NaN           NaN
a2           NaN     0.516667           NaN
a3           NaN     0.006667           NaN
a4           NaN          NaN      0.426667
a5           NaN     0.506667           NaN
a6      0.583333          NaN           NaN
a7      0.550000          NaN           NaN

and I want to combine the columns so that if there is a number in any of the columns and NaN in the other two the result is one column, with the expected output:

a1   0.116667
a2   0.516667
a3   0.006667
a4   0.426667
a5   0.506667
a6   0.583333
a7   0.550000
Aaron
  • 2,367
  • 3
  • 25
  • 33
  • 1
    If there is only one value, you can use [`DataFrame.sum(axis=1)`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html) to compute the _sum_ of each row. – David Gasquez Feb 11 '16 at 21:58
  • Why couldn't you just sum them? You don't provide an example where you have two or three numbers in a given row. – Alexander Feb 11 '16 at 22:09
  • @Alexander because there is never the case, at least in my data –  Feb 11 '16 at 22:14
  • In which case simply use `df.sum(axis=1)` – Alexander Feb 11 '16 at 22:15

2 Answers2

2

You didn't specify what should happen if, for some row, there are two or more values that are not NaNs.

Subject to what you want in this case, a simple row-wise maximum might solve your problem:

df = pd.DataFrame({
    'a': [1, None, None], 
    'b': [None, 3, None],
    'c': [None, None, 4]})
>>> df
    a   b   c
0   1   NaN     NaN
1   NaN     3   NaN
2   NaN     NaN     4

Now, taking the row-wise maximum:

>>> df.max(axis=1)
0    1
1    3
2    4
dtype: float64
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
1

You can create a mask where the number of null values in a give row equals two. Then use the mask to sum the other columns.

df = pd.DataFrame({'cat': ['a' + str(i) for i in range(1, 8)], 
                   'col1': [0.116667, np.nan, np.nan, np.nan, np.nan, 0.583333, 0.550000], 
                   'col2': [np.nan, 0.516667, 0.006667, np.nan, 0.506667, np.nan, np.nan], 
                   'col3': [np.nan, np.nan, np.nan, 0.426667, np.nan, np.nan, np.nan]})

>>> df
  cat      col1      col2      col3
0  a1  0.116667       NaN       NaN
1  a2       NaN  0.516667       NaN
2  a3       NaN  0.006667       NaN
3  a4       NaN       NaN  0.426667
4  a5       NaN  0.506667       NaN
5  a6  0.583333       NaN       NaN
6  a7  0.550000       NaN       NaN

mask = df.isnull().sum(axis=1) == 2
df.loc[mask, 'col1'] = df[mask].sum(axis=1)

>>> df
  cat      col1      col2      col3
0  a1  0.116667       NaN       NaN
1  a2  0.516667  0.516667       NaN
2  a3  0.006667  0.006667       NaN
3  a4  0.426667       NaN  0.426667
4  a5  0.506667  0.506667       NaN
5  a6  0.583333       NaN       NaN
6  a7  0.550000       NaN       NaN
Alexander
  • 105,104
  • 32
  • 201
  • 196