how do I divide each number by the sum? (skipping zeros) I'd like to divide each row by it's sum
ex. 0(number on 0 column)/2(sum column)
0 1 2 3 4 5 6 7 ... sum
0 0 0 0 1 0 0 0 ... 2
result
0 1 2 3 4 5 6 7 ... sum
0 0 0 0 0.5 0 0 0 2
how do I divide each number by the sum? (skipping zeros) I'd like to divide each row by it's sum
ex. 0(number on 0 column)/2(sum column)
0 1 2 3 4 5 6 7 ... sum
0 0 0 0 1 0 0 0 ... 2
result
0 1 2 3 4 5 6 7 ... sum
0 0 0 0 0.5 0 0 0 2
You can try something like this
#### this will contain everyother column except sum
required_columns = df.columns[~df.contains.str.contains('sum')]
### regex can also be used with contains , I m here assuming you all other column will not be named as sum , for which the division is to be performed
for col in required_colums:
print (f'---------- {col} --------')
df.loc[:,col] = df.loc[:,col]/df.loc[:,'sum']
You can also give this to get the same answer.
df.iloc[:,:-1] = df.apply(lambda r: r/r['sum'] if r['sum'] != 0 else r['sum'],axis=1).round(2)
The output of this will be:
Source df:
0 1 2 3 4 5 6 7 sum
0 0 0 0 0 1 0 0 0 2
1 0 0 0 6 0 0 0 0 18
2 0 0 0 0 1 0 0 0 0
3 0 0 3 0 0 0 4 0 1
This will result in:
0 1 2 3 4 5 6 7 sum
0 0.0 0.0 0.0 0.00 0.5 0.0 0.0 0.0 2
1 0.0 0.0 0.0 0.33 0.0 0.0 0.0 0.0 18
2 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0
3 0.0 0.0 3.0 0.00 0.0 0.0 4.0 0.0 1
Here is the explanation for the above code:
On the left hand side of the equation, I have iloc. You can get more documentation of iloc here.
df.iloc[:,:-1]
Here I am picking all the rows (first set of :,). The second set is the columns. I am having the right hand side computational value assigned to all but the last column which is the sum
column. I dont want to replace that value.
df.apply will process the dataframe one row at a time. see examples of df.apply here
Here I am picking the first row (r) and processing it. You wanted to compute column (x) / column('sum'). Thats what i am doing. It does this for each column in the row.
I am also checking if r['sum']
is not equal to zero to avoid division by zero error. If the value of r['sum']
is zero, then i am sending r['sum']
(or zero).
A DataFrame object has two axes: “axis 0” and “axis 1”. “axis 0” represents rows and “axis 1” represents columns. I am using axis = 1
to traverse through the row instead of values in each column.
Hope this explanation helps.