Pandas - Duplicate rows on function application

Question

I have a dataframe, and I'm trying to apply a single function to that dataframe, with multiple arguments. I want the results of the function application to be stored in a new column, with each row duplicated to match each column, but I can't figure out how to do this.

Simple example:

df= pd.DataFrame({"a" : [4 ,5], "b" : [7, 8]},    index = [1, 2])

   a  b
1  4  7
2  5  8

Now, I want to add both the numbers 10 and 11 to column 'a', and store the results in a new column, 'c'. Sorry if this is unclear, but this is the result I'm looking for:

   a  b  c
1  4  7  14
2  4  7  15
3  5  8  15
4  5  8  16

Is there an easy way to do this?

jezrael · Accepted Answer · 2019-11-27T14:45:51.183

0

Use Index.repeat with numpy.tile:

df= pd.DataFrame({"a" : [4 ,5], "b" : [7, 8]},    index = [1, 2])

a  = [10,11]
df1 = (df.loc[df.index.repeat(len(a))]
         .assign(c = lambda x: x.a + np.tile(a, len(df)))
         .reset_index(drop=True)
         .rename(lambda x: x+1)
         )

Or:

df1 = df.loc[df.index.repeat(len(a))].reset_index(drop=True).rename(lambda x: x+1)

df1['c'] = df1.a + np.tile(a, len(df))
print (df1)
   a  b   c
1  4  7  14
2  4  7  15
3  5  8  15
4  5  8  16

Another idea is use cross join:

a  = [10,11]
df1 = df.assign(tmp=1).merge(pd.DataFrame({'c':a, 'tmp':1}), on='tmp').drop('tmp', 1)
df1['c'] += df1.a 
print (df1)

   a  b   c
0  4  7  14
1  4  7  15
2  5  8  15
3  5  8  16

edited Nov 27 '19 at 14:45

answered Nov 27 '19 at 14:34

jezrael

822,522
95
1,334
1,252

For a more general solution, would using zip in the assign function of the first solution be the best approach, or is there a better way? I used the addition as a simple example but let's say I wanted to apply any general function using the sets of values in column 'a' and array a, such as .assign(c = lambda x: [func(y, z) for y,z in zip(x.a, a)]) – upgrayedd Nov 27 '19 at 15:12
@upgrayedd - it depends. If perfromnce is important, then custom function rather not, but if not exist another solution you have to use them. – jezrael Nov 27 '19 at 15:14
@upgrayedd - also in pandas I think working with `list`s in pandas is not [good idea](https://stackoverflow.com/a/52563718/2901002), so first step should be create scalars columns, but all depends of data, sometimes not possible... – jezrael Nov 27 '19 at 15:15

score 0 · Answer 2 · answered Nov 27 '19 at 14:41

0

Using the explode method (pandas >= 0.25.0):

df1 = df.assign(c=df.apply(lambda row: [row.a+10, row.a+11], axis=1))
df1 = df1.explode('c')
print(df1)

   a  b   c
1  4  7  14
1  4  7  15
2  5  8  15
2  5  8  16

answered Nov 27 '19 at 14:41

Horace

1,024
7
12

score 0 · Answer 3 · answered Nov 27 '19 at 15:07

Note that your code example doesn't do what you say (5+10 = 15, not 16). The output from adding 10 and 11 is:

   a  b  c
1  4  7  14
2  4  7  15
3  5  8  15
4  5  8  16

That said, here's some understandable code:

def add_x_y_to_df_col(df, incol, outcol, x, y):
    df1 = df.copy()
    df[outcol] = df[incol] + x
    df1[outcol] = df[incol] + y
    return df.append(df1, ignore_index=True)

df = add_x_y_to_df_col(df, 'a',  'c', 10, 11)

Note this returns:

    a   b   c
0   4   7   14
1   5   8   15
2   4   7   15
3   5   8   16

If you want to sort by column a and restart the index at 1:

df = df.sort_values(by='a').reset_index(drop=True)
df.index += 1

(You could of course add that code to the function.) This gives the desired result:

    a   b   c
1   4   7   14
2   4   7   15
3   5   8   15
4   5   8   16

Pandas - Duplicate rows on function application

3 Answers3