0

I have a dataframe where each row has an author(news channel), the title of the article, and number of comments on that article.

Basically, I want to calculate the number of comments on each article. I currently have the following code, but I want to factor it.

# CSV of news articles, with authors, articles, and comments
df = pd.read_csv('articles.csv')

# Counts per author
art_count = df['AUTHOR'].value_counts()

# Calculate # of comments per article
def comment_sum(df, channel, channel_name, target):
    # ex) sum(df[df['AUTHOR'] == 'NYTIMES']['COMMENTS'])
    return sum(df[df[channel] == channel_name][target])

# Calculate # of comments
com_count = []
for newspaper in art_count.index:
    com_count.append(comment_sum(df,'AUTHOR',newspaper,'COMMENTS'))

I feel as if I can simplify my code, without declaring a method, by using a map and lambda function, but I'm unsure how to go about it.

gust
  • 878
  • 9
  • 23
  • 1
    did you hear of groupby in python? https://stackoverflow.com/questions/30679467/pivot-tables-or-group-by-for-pandas check this one out! – PV8 Oct 02 '19 at 09:28
  • `df.groupby(['AUTHOR','article'])['COMMENTS'].sum()`? ('article' is the column of articles) – ansev Oct 02 '19 at 09:41
  • If either of you post as an answer, I can select the answer. – gust Oct 04 '19 at 06:06

1 Answers1

0

df.groupby(['AUTHOR','article'])['COMMENTS'].sum() for posterity, as @ansev has answered in the comments.

gust
  • 878
  • 9
  • 23