0

I have a question on advanced pandas. Currently, my dataframe’s columns are celebrities, date (YYYY-MM-DD), and No. of followers. For each date, it will show the new no. of followers.

sample data

However, I would like to calculate the average no. of new followers from the starting date, 2020/1/1 to 2020/4/1 for each celebrity in a table format with only the celebrity and no. Of followers in the column.

what I want to look like

How do I write a python code on this?

Thank you very much!

Jay
  • 2,553
  • 3
  • 17
  • 37
Avery
  • 11
  • Try: `df.groupby(['celebrities'])['No. of followers'].mean()` – hacker315 May 10 '20 at 16:17
  • Does this answer your question? [Get statistics for each group (such as count, mean, etc) using pandas GroupBy?](https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby) – Joe May 10 '20 at 17:04
  • Hi! Thank you for your answer Joe. This is really useful with how to calculate the mean with the groupby function. However, I'm still not too sure how to incorporate the date filter with the groupby function. – Avery May 10 '20 at 17:53

2 Answers2

0

You can use groupby to gather all rows by celebrity.

df_grouped = df.groupby(['celebrities'])
for name, group in df_grouped:
    print(group['Followers'].avg())

This will print for each celebrity the avg number of followers. You add your filter by dates if you would like too (group[group['Date']>X]['Followers'].avg())

Roim
  • 2,986
  • 2
  • 10
  • 25
  • Hey Roim! Thank you for the solution. Any chance I'm able to do this in one line? – Avery May 10 '20 at 17:06
  • hmmm I think this ```df.groupby(['celebrities'])['Followers'].avg()``` will work but I didn't try myself – Roim May 10 '20 at 20:18
0

If you want to incorporate the date filter, you need to filter your dataframe first:

    df["Date"] = pd.to_datetime(df["Date"])
    start_date = '2020/1/1'
    end_date = '2020/4/1'

    mask = (df["Date"] >= start_date) & (df["Date"] <= end_date)
    df = df.loc[mask]

    grouped = df.groupby("Celebrity").agg({"No. Followers": 
    "mean"}).reset_index()

    celebrities = np.unique(grouped["Celebrity"])

    dfs = {}

    for c in celebrities:
        dfs[c] = grouped[grouped["Celebrity"] == c]

You can then access your dataframes through a dictionary through the celebrity name as the key.

Hope that helps and please let me know if this answers your question.

matt.aurelio
  • 381
  • 2
  • 9