3

I am looking to create a column in a pandas dataframe that is the function of a variable/dynamic list of column names.

Typical column creation would be:

df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)

where x,y,z are variables from another df.

x 1.1
y 1.2
z 1.3
a 1.4
b 1.5
c 1.6

I want to create a column which would be a function of a variable list of columns.

So for instance if:

cols=['one','two']

then the formula would be created as:

df['new']=(df['one']*x)+(df['two']*y)

But if cols changes to:

cols=['one','two','three','four']

then the formula would change to:

df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)+(df['four']*a)

I know I must be missing something easy here.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
clg4
  • 2,863
  • 6
  • 27
  • 32

2 Answers2

4

try this:

cols=['one', 'two']
df['new'] = df[cols].sum(axis=1)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • I think you may be on to something. Can I use this with my edited change adding multiplication of another variable? – clg4 Mar 05 '16 at 18:31
  • @clg4, sorry i don't get your question... Can you make an example? – MaxU - stand with Ukraine Mar 05 '16 at 18:32
  • same question, I just want to multiply the variable column by another variable so the equation would be:df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)+(df['four']*a) – clg4 Mar 05 '16 at 18:50
  • @clg4, with your last edit you've completely changed your original question, so my answer doesn't apply any longer. Please in future mark it as an update and add it to the original question. – MaxU - stand with Ukraine Mar 05 '16 at 18:57
  • @clg4, check Alexander's solution – MaxU - stand with Ukraine Mar 05 '16 at 18:58
  • thanks. When you gave me your first answer I realized I needed to change. I edited, and strangely it did not note that I edited, despite me adding the edit notes at the bottom... Thanks for the help. – clg4 Mar 05 '16 at 19:00
4

Using zip will return the truncated pairs, so [(a, b) for a, b in zip([1, 2], [3, 4, 5, 6])] will return return [(1, 3), (2, 4)].

df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))

x = 1.1
y = 1.2
z = 1.3
a = 1.4
b = 1.5
c = 1.6

var = [x, y, z, a, b, c]
cols = ['A', 'B', 'C']

>>> sum(df[col] * v for col, v in zip(cols, var))
0    0.729284
1    2.671124
2    1.804285
3    0.791489
4    1.818327
dtype: float64
Alexander
  • 105,104
  • 32
  • 201
  • 196