Create Formula Based on Dynamically Changing Columns to Set Values in Pandas Dataframe Column

Question

I am looking to create a column in a pandas dataframe that is the function of a variable/dynamic list of column names.

Typical column creation would be:

df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)

where x,y,z are variables from another df.

x 1.1
y 1.2
z 1.3
a 1.4
b 1.5
c 1.6

I want to create a column which would be a function of a variable list of columns.

So for instance if:

cols=['one','two']

then the formula would be created as:

df['new']=(df['one']*x)+(df['two']*y)

But if cols changes to:

cols=['one','two','three','four']

then the formula would change to:

df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)+(df['four']*a)

I know I must be missing something easy here.

check this: http://stackoverflow.com/questions/25748683/python-pandas-sum-dataframe-rows-for-given-columns — MaxU - stand with Ukraine, Mar 05 '16 at 18:18

score 4 · Answer 1 · answered Mar 05 '16 at 18:22

4

try this:

cols=['one', 'two']
df['new'] = df[cols].sum(axis=1)

answered Mar 05 '16 at 18:22

MaxU - stand with Ukraine

205,989
36
386
419

I think you may be on to something. Can I use this with my edited change adding multiplication of another variable? – clg4 Mar 05 '16 at 18:31
@clg4, sorry i don't get your question... Can you make an example? – MaxU - stand with Ukraine Mar 05 '16 at 18:32
same question, I just want to multiply the variable column by another variable so the equation would be:df['new']=(df['one']*x)+(df['two']*y)+(df['3']*z)+(df['four']*a) – clg4 Mar 05 '16 at 18:50
@clg4, with your last edit you've completely changed your original question, so my answer doesn't apply any longer. Please in future mark it as an update and add it to the original question. – MaxU - stand with Ukraine Mar 05 '16 at 18:57
@clg4, check Alexander's solution – MaxU - stand with Ukraine Mar 05 '16 at 18:58
thanks. When you gave me your first answer I realized I needed to change. I edited, and strangely it did not note that I edited, despite me adding the edit notes at the bottom... Thanks for the help. – clg4 Mar 05 '16 at 19:00

score 4 · Accepted Answer · answered Mar 05 '16 at 18:42

4

Using zip will return the truncated pairs, so [(a, b) for a, b in zip([1, 2], [3, 4, 5, 6])] will return return [(1, 3), (2, 4)].

df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))

x = 1.1
y = 1.2
z = 1.3
a = 1.4
b = 1.5
c = 1.6

var = [x, y, z, a, b, c]
cols = ['A', 'B', 'C']

>>> sum(df[col] * v for col, v in zip(cols, var))
0    0.729284
1    2.671124
2    1.804285
3    0.791489
4    1.818327
dtype: float64

answered Mar 05 '16 at 18:42

Alexander

105,104
32
201
196

2

The generator expression can be replaced by broadcasting: `(df[cols] * var[:len(cols)]).sum(axis=1)` works too. – unutbu Mar 05 '16 at 18:53
perfect and thanks! million upvotes. Also thanks MaxU, sorry for the edit. – clg4 Mar 05 '16 at 19:08
@clg4, check unutbu's comment - IMO it will work faster – MaxU - stand with Ukraine Mar 05 '16 at 19:14

Create Formula Based on Dynamically Changing Columns to Set Values in Pandas Dataframe Column

2 Answers2