pandas version of SQL CROSS APPLY

Question

Say we have a DataFrame df

df = pd.DataFrame({
    "Id": [1, 2],
    "Value": [2, 5]
})

df
    Id  Value
0   1   2
1   2   5

and some function f which takes an element of df and returns a DataFrame.

def f(value):
    return pd.DataFrame({"A": range(10, 10 + value), "B": range(20, 20 + value)})

f(2)
    A   B
0   10  20
1   11  21

We want to apply f to each element in df["Value"], and join the result to df, like so:

    Id  Value   A   B
0   1   2       10  20
1   1   2       11  21
2   2   5       10  20
2   2   5       11  21
2   2   5       12  22
2   2   5       13  23
2   2   5       14  24

In T-SQL, with a table df and table-valued function f, we would do this with a CROSS APPLY:

SELECT * FROM df
CROSS APPLY f(df.Value)

How can we do this in pandas?

score 6 · Accepted Answer · answered Apr 17 '19 at 10:44

6

You could apply the function to each element in Value in a list comprehension and use pd.concat to concatenate all resulting dataframes. Also assign the corresponding Id so that it can be later on used to merge both dataframes:

l = pd.concat([f(row.Value).assign(Id=row.Id) for _, row in df.iterrows()])
df.merge(l, on='Id')

   Id   Value  A   B
0   1      2  10  20
1   1      2  11  21
2   2      5  10  20
3   2      5  11  21
4   2      5  12  22
5   2      5  13  23
6   2      5  14  24

answered Apr 17 '19 at 10:44

yatu

86,083
12
84
139

2

Solid answer. I use this method a bit with similar files from different time periods to aggregate the data into a large data structure. – Mark Moretto Apr 17 '19 at 10:57
Thanks -- I was going to use an approach like this but I thought there might be a pandas method for it. Didn't know about `assign` -- handy. – Denziloe Apr 17 '19 at 14:16
You're welcome @Denziloe! Don't forget you can accept if it solved it for you :) – yatu Apr 17 '19 at 14:17

score 2 · Answer 2 · edited Dec 18 '19 at 18:28

One of the few cases I would use DataFrame.iterrows. We can iterate over each row, concat the cartesian product out of your function with the original dataframe and at the same time fillna with bfill and ffill:

df = pd.concat([pd.concat([f(r['Value']), pd.DataFrame(r).T], axis=1).bfill().ffill() for _, r in df.iterrows()], 
               ignore_index=True)

Which yields:

print(df)
    A   B   Id  Value
0  10  20  1.0    2.0
1  11  21  1.0    2.0
2  10  20  2.0    5.0
3  11  21  2.0    5.0
4  12  22  2.0    5.0
5  13  23  2.0    5.0
6  14  24  2.0    5.0

pandas version of SQL CROSS APPLY

2 Answers2