0
def comment (row):
    if row['STATUS'] == "CANCELLED":
        return "Cancelled"
    elif  strToDate(row['PROCESS_DATE']) < datetime(2018,1,1) or strToDate(row['PROCESS_DATE']) > datetime(2018,2,1):
        return "Date out of Range"
    elif "Lost" in str(row['NOTE']) or "Stolen" in str(row['TRADE_NOTE_TXT']):
        return 'Lost or Stolen'
    else:
        return 'Other'

df['Comment'] = ''

for i, row in df.iterrows():
    df.at[i,"Comment"] = comment(row)

I use the following above code to change the value of df['Comment'] based on these conditions. However when I do df.count() it shows there are 7790 values in comment.

However when I do df.groupby('Comment').size() The out put is as follows, which is much greater than the number of comments that should even be present.

     Comment
     Cancelled            1171
     Date out of Range    1175
     Lost or Stolen       634
     Other                4810
     dtype: int64

2 Answers2

1

Maybe I am confused as to what it is you're asking but those numbers add up:

1171 + 1175 + 634 + 4810 = 7790

Meaning that df.count() and df.groupby('Comment').size() represent the same number of rows.

tobsecret
  • 2,442
  • 15
  • 26
-1

You need to first properly indent your code under the def comment(row): function to get the answer you expect.

rahlf23
  • 8,869
  • 4
  • 24
  • 54
Keith W
  • 43
  • 2
  • 12