How can I iterate through each row of a pandas dataframe, then conditionally set a new value in that row?

Question

I am working on a school project, so please no exact answers. I have a pandas dataframe that has numerators and denominators rating images of dogs out of 10. When there are multiple dogs in the image, the rating is out of number of dogs * 10. I am trying to adjust it so that for example... if there are 5 dogs, and the rating is 40/50, then the new numerator/denominator is 8/10. Here is an example of my code. I am aware that the syntax does not work in line 3, but I believe it accurately represents what I am trying to accomplish. twitter_archive is the dataframe.

twitter_archive['new_denom'] = 10
twitter_archive['new_numer'] = 0
for numer, denom in twitter_archive['rating_numerator','rating_denominator']:
    if (denom > 10) & (denom % 10 == 0):
        num_denom = denom / 10
        new_numer = numer / num_denom
        twitter_archive['new_numer'] = new_numer

So basically I am checking the denominator if it is above 10, and if it is, is it divisible by 10? if it is, then find out how many times 10 goes into it, and then divide the numerator by that value to get an new numerator. I think my logic for that works fine, but the issue I have is grabbing that row, and then adding that new value to the new column I created, in that row. edit: added df head

	tweet_id	timestamp	text	rating_numerator	rating_denominator	name	doggo	floofer	pupper	puppo	avg_denom
0	8.924206e+17	2017-08-01 16:23:56+00:00	This is Phineas. He's a mystical boy. Only eve...	13.0	10.0	phineas	None	None	None	None	10
1	8.921774e+17	2017-08-01 00:17:27+00:00	This is Tilly. She's just checking pup on you....	13.0	10.0	tilly	None	None	None	None	10
2	8.918152e+17	2017-07-31 00:18:03+00:00	This is Archie. He is a rare Norwegian Pouncin...	12.0	10.0	archie	None	None	None	None	10
3	8.916896e+17	2017-07-30 15:58:51+00:00	This is Darla. She commenced a snooze mid meal...	13.0	10.0	darla	None	None	None	None	10
4	8.913276e+17	2017-07-29 16:00:24+00:00	This is Franklin. He would like you to stop ca...	12.0	10.0	franklin	None	None	None	None	10

copy/paste head below:

{'tweet_id': {0: 8.924206435553362e+17,
  1: 8.921774213063434e+17,
  2: 8.918151813780849e+17,
  3: 8.916895572798587e+17,
  4: 8.913275589266883e+17},
 'timestamp': {0: Timestamp('2017-08-01 16:23:56+0000', tz='UTC'),
  1: Timestamp('2017-08-01 00:17:27+0000', tz='UTC'),
  2: Timestamp('2017-07-31 00:18:03+0000', tz='UTC'),
  3: Timestamp('2017-07-30 15:58:51+0000', tz='UTC'),
  4: Timestamp('2017-07-29 16:00:24+0000', tz='UTC')},
 'text': {0: "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 ",
  1: "This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 ",
  2: 'This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 ',
  3: 'This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us ',
  4: 'This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek '},
 'rating_numerator': {0: 13.0, 1: 13.0, 2: 12.0, 3: 13.0, 4: 12.0},
 'rating_denominator': {0: 10.0, 1: 10.0, 2: 10.0, 3: 10.0, 4: 10.0},
 'name': {0: 'phineas', 1: 'tilly', 2: 'archie', 3: 'darla', 4: 'franklin'},
 'doggo': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
 'floofer': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
 'pupper': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'},
 'puppo': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'None'}}

You do not need to use a for-loop for this. You can try looking into [numpy.select](https://numpy.org/doc/stable/reference/generated/numpy.select.html) — It_is_Chris, Jul 25 '22 at 20:08
Can you post your df (at least the head) as a copy/pastable dictionary, so people can reproduce your code? — Barry the Platipus, Jul 25 '22 at 20:08
@platipus_on_fire_333 I pasted in the csv contents of the .head() if that works — Christian Love, Jul 25 '22 at 20:26
It does not: please do a df.head().to_dict() and paste the result in your question. — Barry the Platipus, Jul 25 '22 at 20:30
@platipus_on_fire_333 Thanks for the tip. I went ahead and replaced it with the df.head().to_dict() — Christian Love, Jul 25 '22 at 20:33

Zorojuro · Accepted Answer · 2022-08-13T18:39:53.377

2

If you want to use for loop to get row values, you can use iterrows() function.

for idx, row in twitter_archive.iterrows():
    denom = row['rating_denominator']
    numer = row['rating_numerator']
    # You can add values in list and concat it with df

Faster way to iterate on df is itertuples():

for row in twitter_archive.itertuples():
    denom = row[1]
    numer = row[2]

But I think best way to create new col from old ones is to use pandas apply function .

df = pd.DataFrame(data={'a' : [1,2], 'b': [3,5]})
df['c'] = df.apply(lambda x: 'sum_is_odd' if (x['a'] + x['b']) % 2 == 1 else 'sum_is_even', axis=1)

In this case, 'c' is a new column and value is calculated using 'a' and 'b' columns.

edited Aug 13 '22 at 18:39

answered Jul 25 '22 at 20:19

Zorojuro

71
5

Thanks for your help. How would I use another column from my dataframe inside of the lambda? for example I would need to do something like this I think... ```twitter_archive['new_numer'] = twitter_archive['rating_numerator'].apply(lambda x: (twitter_archive['rating_numerator'] / (twitter_archive['rating_denominator'] / 10)) if ((twitter_archive['rating_denominator'] > 10) & (twitter_archive['rating_denominator'] % 10 == 0)) else twitter_archive['rating_numerator'], axis=1) ``` but I get this error: TypeError: () got an unexpected keyword argument 'axis' – Christian Love Jul 25 '22 at 20:57
Can you elaborate on what you mean by "use x like row"? I noticed a mistake I made near .apply which was putting the column name on the dataframe, but I still get an error (with and without axis=1) : ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – Christian Love Jul 25 '22 at 21:13
1

I add example in answer. Please tell me if it is helpful. – Zorojuro Jul 25 '22 at 21:16
This did solve my problem. Thank you for the help! I ended up using the apply function. Cheers! – Christian Love Jul 25 '22 at 21:39
Is is possible to set two true values? for example, if this is true, set x['a'] = 1 AND x['b'] = 2? or will it just need to be done in multiple .apply functions? – Christian Love Jul 25 '22 at 21:43
1

Take a look https://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre – Zorojuro Jul 25 '22 at 21:46

How can I iterate through each row of a pandas dataframe, then conditionally set a new value in that row?

1 Answers1