0

Is it possible to iterate over a dataframe and create new columns based on operations performed on existing columns?

For instance if my existing dataframe has 4 columns: a, b, c, d.

I want to create new columns adding a and b, then a and c, then a and d, then b and c, then b and d, then c and d.

I know you can manually create a new column but the actual project I am working on has many more columns so I am wondering if it can be done with a for loop.

Thanks.

  • Welcome to stackoverflow! You might want to check out [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on how to write good pandas examples. Good, reproducible examples will help people answer your quesitons. – Darren Dec 13 '19 at 19:13
  • What kind of data are you dealing with? NumPy may be more appropriate. – AMC Dec 13 '19 at 20:06

1 Answers1

2

For summation, yes, you can do with broadcasting. For general function, you may want to write a loop.

vals = df.to_numpy()

# new column names
cols = pd.MultiIndex.from_product([df.columns, df.columns])

# output:
pd.DataFrame((vals[:,:,None] + vals[:,None,:]).reshape(len(df), -1),
            index=df.index,
            columns=cols)

Output:

    a               b               c               d            
    a   b   c   d   a   b   c   d   a   b   c   d   a   b   c   d
0   0   1   2   3   1   2   3   4   2   3   4   5   3   4   5   6
1   8   9  10  11   9  10  11  12  10  11  12  13  11  12  13  14
2  16  17  18  19  17  18  19  20  18  19  20  21  19  20  21  22
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74