3

let's say I have the following dataframe:

Shots Goals StG 0 1 2 0.5 1 3 1 0.33 2 4 4 1

Now I want to multiply the variable Shots for a random value (multiplier in the code) and recaclucate the StG variable that is nothing but Shots/Goals, the code I used is:

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Shots'] *= multiplier
        row['StG']=float(row['Shots'])/float(row['Goals'])

Then I saved the .csv and it was identically at the original one, so after the for I simply used print(df) to obtain:

Shots Goals StG
0  1     2    0.5
1  3     1    0.33
2  4     4    1 

If I print the values row per row during the for iteration I see they change, but its like they don't save in the df.

I think it is because I'm simply accessing to the values,not the actual dataframe.

I should add something like df.row[], but it returns DataFrame has no row property.

Thanks for the help.

____EDIT____

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Impresions']*=multiplier
        row['Clicks']*=(np.random.randint(1,multiplier+1))
        row['Ctr']= float(row['Clicks'])/float(row['Impresions'])
        row['Mult']=multiplier
        #print (row['Clicks'],row['Impresions'],row['Ctr'],row['Mult'])

The main condition is that the number of Clicks cant be ever higher than the number of impressions.

Then I recalculate the ratio between Clicks/Impressions on CTR.

I am not sure if multiplying the entire column is the best choice to maintain the condition that for each row Impr >= Clicks, hence I went row by row

DDDDEEEEXXXX
  • 97
  • 1
  • 6
  • 2
    see related: http://stackoverflow.com/questions/31458794/python-using-iterrows-to-create-columns – EdChum Apr 11 '17 at 16:06

2 Answers2

3

Fom the pandas docs about iterrows(): pandas.DataFrame.iterrows

"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."

The good news is you don't need to iterate over rows - you can perform the operations on columns:

# Generate an array of random integers of same length as your DataFrame
multipliers = np.random.randint(1, 5+1, size=len(df))

# Multiply corresponding elements from df['Shots'] and multipliers
df['Shots'] *= multipliers

# Recalculate df['StG']
df['StG'] = df['Shots']/df['Goals']
sgrg
  • 1,210
  • 9
  • 15
  • Hello sgrg, I tried to propose a simple example. I'll keep in mind to check the doc before posting a question. Imagine now that I want to also multiply the column shots, but the logic is that the amount of shots can never be bigger than goals in the same row, I iterated it row per making that the multiplier of golas should always be between 0 and the amount of goals of the same row, ensuring this way the same condition. Give me 5 minutes and I'll add the original code. – DDDDEEEEXXXX Apr 11 '17 at 16:31
  • It's not quite clear what you're asking and your updated question has a different example but you can also filter columns based on conditions. See http://stackoverflow.com/questions/18196203/how-to-conditionally-update-dataframe-column-in-pandas for an example. – sgrg Apr 11 '17 at 17:24
  • Also you're best off posting this as a new question for more traction (and linking back to this question for reference) :) – sgrg Apr 11 '17 at 17:25
0

Define a function that returns a series:

def f(x):
    m = np.random.randint(1,5+1)
    return pd.Series([x.Shots * m, x.Shots/x.Goals * m])

Apply the function to the data frame row-wise, it will return another data frame which can be used to replace some columns in the existing data frame, or create new columns in data frame

df[['Shots', 'StG']] = df.apply(f, axis=1)

This approach is very flexible as long as the new column values depend only on other values in the same row.

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85