3

Say we have a DataFrame df

df = pd.DataFrame({
    "Id": [1, 2],
    "Value": [2, 5]
})

df
    Id  Value
0   1   2
1   2   5

and some function f which takes an element of df and returns a DataFrame.

def f(value):
    return pd.DataFrame({"A": range(10, 10 + value), "B": range(20, 20 + value)})

f(2)
    A   B
0   10  20
1   11  21

We want to apply f to each element in df["Value"], and join the result to df, like so:

    Id  Value   A   B
0   1   2       10  20
1   1   2       11  21
2   2   5       10  20
2   2   5       11  21
2   2   5       12  22
2   2   5       13  23
2   2   5       14  24

In T-SQL, with a table df and table-valued function f, we would do this with a CROSS APPLY:

SELECT * FROM df
CROSS APPLY f(df.Value)

How can we do this in pandas?

Denziloe
  • 7,473
  • 3
  • 24
  • 34

2 Answers2

6

You could apply the function to each element in Value in a list comprehension and use pd.concat to concatenate all resulting dataframes. Also assign the corresponding Id so that it can be later on used to merge both dataframes:

l = pd.concat([f(row.Value).assign(Id=row.Id) for _, row in df.iterrows()])
df.merge(l, on='Id')

   Id   Value  A   B
0   1      2  10  20
1   1      2  11  21
2   2      5  10  20
3   2      5  11  21
4   2      5  12  22
5   2      5  13  23
6   2      5  14  24
yatu
  • 86,083
  • 12
  • 84
  • 139
  • 2
    Solid answer. I use this method a bit with similar files from different time periods to aggregate the data into a large data structure. – Mark Moretto Apr 17 '19 at 10:57
  • Thanks -- I was going to use an approach like this but I thought there might be a pandas method for it. Didn't know about `assign` -- handy. – Denziloe Apr 17 '19 at 14:16
  • You're welcome @Denziloe! Don't forget you can accept if it solved it for you :) – yatu Apr 17 '19 at 14:17
2

One of the few cases I would use DataFrame.iterrows. We can iterate over each row, concat the cartesian product out of your function with the original dataframe and at the same time fillna with bfill and ffill:


df = pd.concat([pd.concat([f(r['Value']), pd.DataFrame(r).T], axis=1).bfill().ffill() for _, r in df.iterrows()], 
               ignore_index=True)

Which yields:

print(df)
    A   B   Id  Value
0  10  20  1.0    2.0
1  11  21  1.0    2.0
2  10  20  2.0    5.0
3  11  21  2.0    5.0
4  12  22  2.0    5.0
5  13  23  2.0    5.0
6  14  24  2.0    5.0
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Erfan
  • 40,971
  • 8
  • 66
  • 78