1

I have 3 dataframes (df1, df2, df3) which are identically structured (# and labels of rows/columns), but populated with different values.

I want to populate df3 based on values in the associated column/rows in df1 and df2. I'm doing this with a FOR loop and a custom function:

for x in range(len(df3.columns)):
    df3.iloc[:, x] = customFunction(x)

I want to populate df3 using this custom IF/ELSE function:

def customFunction(y):
    if df1.iloc[:,y] <> 1 and df2.iloc[:,y] = 0:
        return "NEW"
    elif df2.iloc[:,y] = 2:
        return "OLD"
    else:
        return "NEITHER"

I understand why I get an error message when i run this, but i can't figure out how to apply this function to a series. I could do it row by row with more complex code but i'm hoping there's a more efficient solution? I fear my approach is flawed.

crowsnest
  • 59
  • 5

2 Answers2

1
v1 = df1.values
v2 = df2.values

df3.loc[:] = np.where(
    (v1 != 1) & (v2 == 0), 'NEW',
    np.where(v2 == 2, 'OLD', 'NEITHER'))
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • YES! This is awesome and so simple. Thanks piRSquared, this knowledge is going to be a real asset for me moving forward! – crowsnest Jul 20 '17 at 17:33
0

Yeah, try to avoid loops in pandas, its inefficient and built to be used with the underlying numpy vectorization.

You want to use the apply function.

Something like:

df3['new_col'] = df3.apply(lambda x: customFunction(x))

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html

tormond
  • 412
  • 5
  • 16
  • Thanks tormond, though the error i'm getting is with the custom function, as i'm trying to use conditions on a series. Here's the error message: "ValueError: ('The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" – crowsnest Jul 20 '17 at 16:53
  • So this is answered in plenty of places. https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – tormond Jul 20 '17 at 16:57
  • I might be missing what your're trying to tell me, but that thread deals with filtering a dataframe using columns from the same dataframe. I'm trying to use conditions to populate a complete column using values from other dataframes. – crowsnest Jul 20 '17 at 17:10
  • The problem is with the conditional. From the first sentence of that answer "The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use "bitwise" | (or) or & (and) operations" – tormond Jul 20 '17 at 17:12