let's say I have the following dataframe:
Shots Goals StG
0 1 2 0.5
1 3 1 0.33
2 4 4 1
Now I want to multiply the variable Shots for a random value (multiplier in the code) and recaclucate the StG variable that is nothing but Shots/Goals, the code I used is:
for index,row in df.iterrows():
multiplier = (np.random.randint(1,5+1))
row['Shots'] *= multiplier
row['StG']=float(row['Shots'])/float(row['Goals'])
Then I saved the .csv and it was identically at the original one, so after the for I simply used print(df) to obtain:
Shots Goals StG
0 1 2 0.5
1 3 1 0.33
2 4 4 1
If I print the values row per row during the for iteration I see they change, but its like they don't save in the df.
I think it is because I'm simply accessing to the values,not the actual dataframe.
I should add something like df.row[], but it returns DataFrame has no row property.
Thanks for the help.
____EDIT____
for index,row in df.iterrows():
multiplier = (np.random.randint(1,5+1))
row['Impresions']*=multiplier
row['Clicks']*=(np.random.randint(1,multiplier+1))
row['Ctr']= float(row['Clicks'])/float(row['Impresions'])
row['Mult']=multiplier
#print (row['Clicks'],row['Impresions'],row['Ctr'],row['Mult'])
The main condition is that the number of Clicks cant be ever higher than the number of impressions.
Then I recalculate the ratio between Clicks/Impressions on CTR.
I am not sure if multiplying the entire column is the best choice to maintain the condition that for each row Impr >= Clicks, hence I went row by row