Multiply columns values by a scalar based on conditions DataFrame

Question

I want to multiply column values by a specific scalar based on the name of the column:

if column name = "Math", then all the values in 'Math" column should be multiply by 5;
if column name = "Physique", values in that column should be multiply by 4;
if column name = "Bio", values in that column should be multiplied by 3;
all the remaining columns should be multiplied by 2

What I have:

This is what I should have :

listm = ['Math', 'Physique', 'Bio']
def note_coef(row):
    for m in listm:
        if 'Math' in listm:
             result = df['Math']*5
    return result

df2=df.apply(note_coef)
df2

Note I stopped with only 1 if to test my code but the outcome is not what I expected. I am quite new in programming and here as well.

Please don’t post images of the data as we can’t test them. Instead, create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), by sharing a sample of the DataFrame(s) and the expected output within a code block along with the description of the problem. This allows us to easily reproduce your problem and help you. This should help: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Rodalm, Jul 01 '22 at 00:00
Sorry about that. As I am new, I have to learn how to post a question as requested by the site. I will try my best next time. Thank you. — Want2Learn, Jul 01 '22 at 16:37
No worries, I understand that, we've all been there ;) It's just for future reference. Cheers — Rodalm, Jul 01 '22 at 21:08

AboAmmar · Answer 1 · 2022-07-01T00:37:51.697

0

You can use df['col_name'].multiply(value) to apply on a whole column. The remaining columns can be modified in a loop of all columns except listm.

listm = ['Math', 'Physique', 'Bio']
for i, head in enumerate(listm):
    df[head] = df[head].multiply(5-i)

heads = df.head()
for head in heads:
    if not head in listm:
        df[head] = df[head].multiply(2)

edited Jul 01 '22 at 00:37

answered Jul 01 '22 at 00:03

AboAmmar

5,439
2
13
24

Hello AboAmmar; thank you for your response. My question is, what if, the scalar are not 5, 4, 3? They can be any int actually. – Want2Learn Jul 03 '22 at 12:57

score 0 · Answer 2 · answered Jul 01 '22 at 00:12

Fake data

n=5
d = {'a':np.ones(n),
     'b':np.ones(n),
     'c':np.ones(n),
     'd':np.ones(n)}
df = pd.DataFrame(d)
print(df)

Select the columns and multiply by a tuple.

df[['a','c']] = df[['a','c']] * (2,4)
print(df)

     a    b    c    d
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0

     a    b    c    d
0  2.0  1.0  4.0  1.0
1  2.0  1.0  4.0  1.0
2  2.0  1.0  4.0  1.0
3  2.0  1.0  4.0  1.0
4  2.0  1.0  4.0  1.0

Naveed · Answer 3 · 2022-07-01T16:01:23.757

here is another way to do it using array multiplication

The data was not provided as a text, so created the test data in a patter of the screen shot

mul = [5,4,3,2,2,2,2,1]  # multipliers

df1=df.iloc[:,1:].mul(mul)
df1.total = df1.iloc[:,:7].sum(axis=1)
df.update(df1, join='left', overwrite=True)
df

    source  Math    Physics     Bio     Algo    Archi   Sport   eng     total
0     A     50.0    60.0       60.0     50.0    60.0    70.0    80.0    430.0
1     B     55.0    64.0       63.0     52.0    62.0    72.0    82.0    450.0
2     C     5.5     8.4         9.3     NaN     NaN     NaN     NaN     23.2
3     D     NaN     NaN         NaN     22.0    42.0    62.0    82.0    208.0
4     E     6.0     8.8         9.6     NaN     NaN     NaN     NaN     24.4
5     F     NaN     NaN         NaN     24.0    44.0    64.0    84.0    216.0

TEST DATA

data_out = [
    ['A', 10,15,20,25,30,35,40],
    ['B', 11,16,21,26,31,36,41],
    ['C', 1.1,2.1,3.1],
    ['D', np.NaN,np.NaN,np.NaN,11,21,31,41],
    ['E', 1.2,2.2,3.2],
    ['F', np.NaN,np.NaN,np.NaN,12,22,32,42],
]
df=pd.DataFrame(data_out, columns=[ 'source', 'Math', 'Physics', 'Bio', 'Algo', 'Archi', 'Sport', 'eng'])
df['total'] = df.iloc[:,1:].sum(axis=1)

    source  Math    Physics     Bio     Algo    Archi   Sport   eng     total
0     A     10.0    15.0       20.0     25.0    30.0    35.0    40.0    175.0
1     B     11.0    16.0       21.0     26.0    31.0    36.0    41.0    182.0
2     C     1.1     2.1         3.1     NaN     NaN     NaN     NaN     6.3
3     D     NaN     NaN         NaN     11.0    21.0    31.0    41.0    104.0
4     E     1.2     2.2         3.2     NaN     NaN     NaN     NaN     6.6
5     F     NaN     NaN         NaN     12.0    22.0    32.0    42.0    108.0

score 0 · Accepted Answer · answered Jul 01 '22 at 21:24

I think the most elegant solution is to define a dictionary (or a pandas.Series) with the multiplying factor for each column of your DataFrame (factors). Then you can multiply all the columns with the corresponding factor simply using df *= factors.

The multiplication is done via column axis alignment, i.e. by aligning the df.columns with the dictionary keys.

For instance, given the following DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame(np.ones(shape=(4, 5)), columns=['Math', 'Physique', 'Bio', 'Algo', 'Archi'])

>>> df

   Math  Physique  Bio  Algo  Archi
0   1.0       1.0  1.0   1.0    1.0
1   1.0       1.0  1.0   1.0    1.0
2   1.0       1.0  1.0   1.0    1.0
3   1.0       1.0  1.0   1.0    1.0

You can do:

factors = {'Math': 5, 'Physique': 4, 'Bio': 3}
default_factor = 2

factors.update({col: default_factor for col in df.columns if col not in factors})

df *= factors

print(df)

Output:

   Math  Physique  Bio  Algo  Archi
0   5.0       4.0  3.0   2.0    2.0
1   5.0       4.0  3.0   2.0    2.0
2   5.0       4.0  3.0   2.0    2.0
3   5.0       4.0  3.0   2.0    2.0

This is exactly what I was loking for. Can you explain what dows this code do? factors.update({col: default_factor for col in df.columns if col not in factors}). — Want2Learn, Jul 03 '22 at 07:29
However, i noticed (1) the firts column of my data frame which is a str is also multiply by the default_factor; (2) values in my data frame keeps changing as long as you run the code. I'd like the code to run just one time no matter how many time it is executed. Here is the code I figured coefi = [1,5, 4, 3, 2, 2, 2, 2] df_coefi = df[['Nom Etudiant','Math', 'Physique', 'Bio', 'Algo', 'Archi', 'Sport', 'Ang']] * coefi — Want2Learn, Jul 03 '22 at 07:36

Multiply columns values by a scalar based on conditions DataFrame

4 Answers4