I have the following dataframe consisting of UserId
and the Name
of the badge earned by that person on Stackoverflow. Now, each badge belongs to a particular category such as Question
, Answer
, Participation
, Moderation
and Tag
. I want to create a column called Category
to store the category of each badge.
The code that I have written works well if data is less than 1M users, for more data it just keeps loading. How to fix this?
Dataframe (badges)
UserId | Name
1 | Altruist
2 | Autobiographer
3 | Enlightened
4 | Citizen Patrol
5 | python
Code
def category(df):
questionCategory = ['Altruist', 'Benefactor', 'Curious', 'Inquisitive', 'Socratic', 'Favorite Question', 'Stellar Question', 'Investor', 'Nice Question', 'Good Question', 'Great Question', 'Popular Question', 'Notable Question', 'Famous Question', 'Promoter', 'Scholar', 'Student']
answerCategory = ['Enlightened', 'Explainer', 'Refiner', 'Illuminator', 'Generalist', 'Guru', 'Lifejacket', 'Lifeboat', 'Nice Answer', 'Good Answer', 'Great Answer', 'Populist', 'Revival', 'Necromancer', 'Self-Learner','Teacher', 'Tenacious', 'Unsung Hero']
participationCategory = ['Autobiographer','Caucus', 'Constituent', 'Commentator', 'Pundit', 'Enthusiast', 'Fanatic', 'Mortarboard', 'Epic', 'Legendary', 'Precognitive', 'Beta', 'Quorum', 'Convention', 'Talkative', 'Outspoken', 'Yearling']
moderationCategory = ['Citizen Patrol', 'Deputy', 'Marshal', 'Civic Duty', 'Cleanup', 'Constable', 'Sheriff', 'Critic', 'Custodian', 'Reviewer', 'Steward', 'Disciplined', 'Editor', 'Strunk & White', 'Copy Editor', 'Electorate', 'Excavator', 'Archaelogist', 'Organizer', 'Peer Pressure', 'Proofreader', 'Sportsmanship', 'Suffrage', 'Supporter', 'Synonymizer', 'Tag Editor', 'Research Assistant', 'Taxonomist', 'Vox Populi']
#Tag Category will be represented as 0
df['Category'] = 0
for i in range(len(df)) :
if (df.loc[i, "Name"] in questionCategory):
df.loc[i, 'Category'] = 1
elif (df.loc[i, "Name"] in answerCategory):
df.loc[i, 'Category'] = 2
elif (df.loc[i, "Name"] in participationCategory):
df.loc[i, 'Category'] = 3
elif (df.loc[i, "Name"] in moderationCategory):
df.loc[i, 'Category'] = 4
return df
category(stackoverflow_badges)
Expected Output
UserId | Name | Category
1 | Altruist | 1
2 | Autobiographer | 3
3 | Enlightened | 2
4 | Citizen Patrol | 4
5 | python | 0