What's the pandas way of computing a new value for each row of a dataframe?

Question

I have a dataframe like this:

     name   upvotes  posts  
  0  Britt  4        232
  1  Henry  1        152
     ...
  9  Kevin  1        48

I want to create a new column, let's call it clout, that is a function of a user's score and posts.

In standard fare Python, if this was a list of dictionaries, I would approach the problem iteratively as follows:

for row in myListOfDicts:
    row['clout'] = computeClout(row['upvotes'],row['posts'])

But this approach seems wrong in Pandas based off of this answer: https://stackoverflow.com/a/55557758/4382391

So what should I be doing in this case?

For a general `computeClout` function, that's the only way. You can try to recode your function so as it can take series as input and then you can do `df['clout'] = computeClout(df['upvotes'], df['posts'])`. — Quang Hoang, Nov 13 '20 at 20:47

score 2 · Accepted Answer · answered Nov 13 '20 at 20:50

2

You can try

df['clout' ] = df[['upvotes', 'Posts' ]].apply(computeClout, axis=1)

answered Nov 13 '20 at 20:50

Renaud

score 1 · Answer 2 · answered Nov 13 '20 at 20:48

1

You can use apply as following

df['clout'] = df.apply(lambda row: computeClout(row['upvotes'],row['posts']), axis=1)

answered Nov 13 '20 at 20:48

rpanai

note that `apply` is essentially equivalent to the code above. – Quang Hoang Nov 13 '20 at 20:50
Using `apply`, even if not ideal, is faster than a loop. – rpanai Nov 13 '20 at 20:55
Actually, list comprehension is faster than apply which is faster than iterrows. But they all are essentially Python for loop so the difference as generally small. – Quang Hoang Nov 13 '20 at 20:59
@QuangHoang it's true but if you check the timestamp I answered few minutes before so I couldn't see the other answer. – rpanai Jun 29 '21 at 17:55

2 Answers2