First, let me say: I know I shouldn't be iterating over a dataframe per:
How to iterate over rows - Don't!
etc.
However, for my application I don't think I have a better option, although I am relatively new to python & pandas and may simply lack the knowledge. However, with my iteration, as I am iterating over rows, I need to access an adjacent row's data, which I can't figure out how to do with vectorization or list comprehension.
Which leaves me with iteration. I have seen several posts on iterrows() and itertuples() which will work. Before I found out about these though, i tried:
for i in workingDF.index:
if i==0:
list2Add = ['NaN']
compareItem = workingDF.at[0,'name']
else:
if (workingDF.at[i,'name'] != compareItem):
list2Add.append('NaN')
compareItem = workingDF.at[i,'name']
else:
currentValue = workingDF.at[i,'value']
yesterdayValue = workingDF.at[(i-1),'value']
r = currentValue - yesterdayValue
list2Add.append(r)
Anyway, my naive code seemed to work fine/as intended (so far). So the question is: Is there some inherent reason not to use "for i in workingDF.index" in favor of the standard iterrows() and itertuples? (Presumably there must be since those are the "recommended" methods...)
Thanks in advance. Jim
EDIT: An example was requested. In this example each row contains a name, testNumber, and score. The example code creates a new column labelled "change" which represents the change of the current score compared to the most recent prior score. Example code:
import pandas as pd
def createDF():
# list of name, testNo, score
nme2 = ["bob", "bob", "bob", "bob", "jim", "jim", "jim" ,"jim" ,"ed" ,"ed" ,"ed" ,"ed"]
tstNo2 = [1,2,3,4,1,2,3,4,1,2,3,4]
scr2 = [82, 81, 80, 79,93,94,95,98,78,85,90,92]
# dictionary of lists
dict = {'name': nme2, 'TestNo': tstNo2, 'score': scr2}
workingDF = pd.DataFrame(dict)
return workingDF
def addChangeColumn(workingDF):
"""
returns a Dataframe object with an added column named
"change" which represents the change in score compared to
most recent prior test result
"""
for i in workingDF.index:
if i==0:
list2Add = ['NaN']
compareItem = workingDF.at[0,'name']
else:
if (workingDF.at[i,'name'] != compareItem):
list2Add.append('NaN')
compareItem = workingDF.at[i,'name']
else:
currentScore = workingDF.at[i,'score']
yesterdayScore = workingDF.at[(i-1),'score']
r = currentScore - yesterdayScore
list2Add.append(r)
modifiedDF = pd.concat([workingDF, pd.Series(list2Add, name ='change')], axis=1)
return(modifiedDF)
if __name__ == '__main__':
myDF = createDF()
print('myDF is:')
print(myDF)
print()
newDF = addChangeColumn(myDF)
print('newDF is:')
print(newDF)
Example Output:
myDF is:
name TestNo score
0 bob 1 82
1 bob 2 81
2 bob 3 80
3 bob 4 79
4 jim 1 93
5 jim 2 94
6 jim 3 95
7 jim 4 98
8 ed 1 78
9 ed 2 85
10 ed 3 90
11 ed 4 92
newDF is:
name TestNo score change
0 bob 1 82 NaN
1 bob 2 81 -1
2 bob 3 80 -1
3 bob 4 79 -1
4 jim 1 93 NaN
5 jim 2 94 1
6 jim 3 95 1
7 jim 4 98 3
8 ed 1 78 NaN
9 ed 2 85 7
10 ed 3 90 5
11 ed 4 92 2
Thank you.