-1

I have a Pandas DataFrame of 23 columns and 1119 rows.

Here is the issue, columns 13, 14, 20 and 21 are of float dtype.

If data in column 13 and 14 is nan, then they are present in 20 and 21, and vice versa.

I want to create a column, if value is missing, get from the other.

Example: column 13 and 14 is nan then get value from 20 and 21.

Here is what I came up with, I created a function and iterated using itertuples

def AP_calc(df):
    for i in df.itertuples():
        if i[20]==np.nan & i[21]==np.nan:
           pool = i[13] + i[14]
        else:
            pool = i[20] + i[21]
        return pool

then used an apply function but this does not work.

df["test"] = df[['AP in %','AP_M in %','FixP in €','FixP C in €']].apply(AP_calc,axis=1)

I have tried other methods too but not working, please help me out, please

JA-pythonista
  • 1,225
  • 1
  • 21
  • 44
  • share a sample of the dataframe, with only the four affected columns. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – sammywemmy Feb 21 '20 at 06:07
  • I just did, please check – JA-pythonista Feb 21 '20 at 06:12
  • You likely did not read @sammywemmy's link before you posted that, because that is not what was asked for. Please also read [Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) to see why. – Amadan Feb 21 '20 at 06:14
  • Is column `20` called `FixP in €` or `AP in %` ? – jezrael Feb 21 '20 at 06:19

2 Answers2

1

Use numpy.where with mask created by Series.isna:

m = df['FixP in €'].isna() & df['FixP C in €'].isna()
df["test"] = np.where(m, df['AP in %'] + df['AP_M in %'], df['FixP in €'] + df['FixP C in €'])

Or:

c1 = ['FixP in €','FixP C in €']
c2 = ['AP in %','AP_M in %']

m = df[c2].isna().all(axis=1)
df["test"] = np.where(m, df[c1].sum(axis=1), df[c2].sum(axis=1))

Alternative with selecting by positions with DataFrame.iloc:

c1 = [20,21]
c2 = [13,14]

m = df.iloc[:, c2].isna().all(axis=1)
df["test"] = np.where(m, df.iloc[:, c1].sum(axis=1), df.iloc[:, c2].sum(axis=1))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Just let the columns fill the N/A places of each other:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Bob': [10, 30, np.nan], 
                   'Alice': [13, np.nan, 40]})

df['Bob'].fillna(df['Alice'], inplace=True)
df['Alice'].fillna(df['Bob'], inplace=True)
Amadan
  • 191,408
  • 23
  • 240
  • 301