0

In pandas DataFrame, i need to create result column, that is result of comparison of two columns.

If in column SECOND - NaN, then in result column - FIRST.

If in column SECOND - not NaN, then in result column - SECOND.

first second
0 test Nan
1 test2 test3

->

result first second
0 test test Nan
1 test3 test2 test3
df = pandas.read_csv(file)
df["result"] = df["second"] if df["second"] else df["first"]
Pavel
  • 1
  • 1

2 Answers2

0
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'first': ['A', 'C'],
    'second': ['B', None]
})
df['result'] = np.where(
    df['second'].isna(),
    df['first'],
    df['second']
)
ThSorn
  • 507
  • 4
  • 7
  • 1
    Please don't post code-only answers. Future readers will be grateful to see explained *why* this answers the question instead of having to infer it from the code. – Gert Arnold May 26 '22 at 18:52
0

You can do this using the pandas apply function.

If your dataframe is,

df = pd.DataFrame({'first' : ['test', 'test2'], 'second' : [np.nan, 'test3']})
print(df)
    first   second
0   test    NaN
1   test2   test3

You can then use the apply function to create the 'result' column based on the other two columns:

df['result'] = df.apply(lambda row: row['first'] if pd.isna(row['second']) 
                                    else row['second'],
               axis=1)

The important part is axis=1 as that tells apply to use rows rather than columns.

If you'd prefer to not use a messy looking lambda function, the following code does the same thing:

def func(row):
    if pd.isna(row['second']):
        return row['first']
    else:
        return row['second']
df['result'] = df.apply(lambda row: func(row), axis=1)

Both of these produce the following dataframe:

    first   second  result
0   test    NaN     test
1   test2   test3   test3
Tom B.
  • 158
  • 9