I have a large dataset, I want to normalize all the columns in it, to have 100 on top of each column. I used the following code
df.apply(lambda x: (x / x.iloc[0])*100)
But in some columns I have 0 values on top, that return nan. How can I amend the code in a way that get the first non-zero value not first row's value
this is a sample of my dataframe
DataFrame using arrays.
import pandas as pd
# initialise data of lists.
data = {'marksA':[99, 98, 95, 80, 98, 95, 85],
'marksB':[0, 0, 95, 80, 98, 95, 85],
'marksC':[89, 98, 95, 83, 98, 95, 85]}
# Creates pandas DataFrame.
df = pd.DataFrame(data, index =['2000/01/01', '2001/01/01', '2002/01/01', '2003/01/01', '2004/01/01', '2005/01/01', '2006/01/01'])
# print the data
df
marksA marksB marksC
2000/01/01 99 0 89
2001/01/01 98 0 98
2002/01/01 95 95 95
2003/01/01 80 80 83
2004/01/01 98 98 98
2005/01/01 95 95 95
2006/01/01 85 85 85
normalization = df.apply(lambda x: (x / x.iloc[0])*100)
normalization
marksA marksB marksC
2000/01/01 100.00 nan 100.00
2001/01/01 98.99 nan 110.11
2002/01/01 95.96 inf 106.74
2003/01/01 80.81 inf 93.26
2004/01/01 98.99 inf 110.11
2005/01/01 95.96 inf 106.74
2006/01/01 85.86 inf 95.51