1

I created a function with 3 parameters input: x y z. I want to loop over them. x is a dataframe with one column y same z asks for a dataframe with multiple columns

I tried this:

result = [f(x,y,z) for x,y,z in zip(df1["1com"], df2["1com"], df3["3com"])]

Df 1,2,3 have the same index length.

This doensnt work because the method list comp doesn't allow for multiple columns like this. I tried a bunch of things with out succes.

btw I found the list comprehension method here: How to iterate over rows in a DataFrame in Pandas

1 Answers1

1

You could zip with individual columns of the multi-column DataFrame:

import pandas as pd

df1 = pd.DataFrame({"col_1": [1, 2, 3]})
df2 = pd.DataFrame({"col_1": [4, 5, 6]})
df3 = pd.DataFrame({"col_1": [7, 8, 9], "col_2": [10, 11, 12]})

def f(w, x, y, z):
    return sum([w, x, y, z])

result = [
    f(w, x, y, z)
    for w, x, y, z
    in zip(
        df1["col_1"], df2["col_1"],
        df3["col_1"], df3["col_2"]  # list all required df3 columns individually
    )
]
print(result)

Output:

[22, 26, 30]

Or you could join the DataFrames into a single one first:

df = df1.join(df2, lsuffix="_df1").join(df3, lsuffix="_df2")
print(df)
result = [
    f(w, x, y, z)
    for idx, (w, x, y, z)
    in df.iterrows()
]
print(result)

Output:

   col_1_df1  col_1_df2  col_1  col_2
0          1          4      7     10
1          2          5      8     11
2          3          6      9     12
[22, 26, 30]

Or you could convert df3 to a list of Series and "pivot" it using zip like below.

def f(x, y, z):
    return x, y, z

result = [
    f(x, y, z)
    for x, y, z
    in zip(
        df1["col_1"],
        df2["col_1"],
        zip(*[df3[c] for c in df3.columns]))
]
print(result)

Output:

[(1, 4, (7, 10)), (2, 5, (8, 11)), (3, 6, (9, 12))]
Czaporka
  • 2,190
  • 3
  • 10
  • 23
  • Thanks for the quick response. In that case I have to change the function to accept more than 3 parameters. it is one solution I thought of. right now my function parameter Y requires a dataframe with multiple columns. The number of columns is dependent on te information I feed my code. So I prefer to have the flexibility. on the other hand I would not have to change to much and the flexibility is one I can do without. – Martijn van den Hoorn Oct 02 '20 at 15:05
  • Ok I think I've figured out a way and updated the answer, please check if the last example does what you wanted. Also, I don't know what your function does but if it treats all the columns in the same way, then perhaps the most flexible solution would be to define it as `def f(*args)`, which would make it accept an arbitrary number of args that would then be stored in a list. The inner tuple in each record could then be unpacked inside the list comprehension, and `f` would then just receive a flat list of numbers (according to the example). – Czaporka Oct 02 '20 at 18:33
  • Dindnt work out. I now edited my fuction in such a way is doesn't require a dataframe with multiple columns anymore. – Martijn van den Hoorn Oct 08 '20 at 16:51