0

I have the following 2 dataframes posts, which shows when a particular post was published with the publisher UserId(A user has made more than 1 post) and badges,which shows the date-time when a particular user attained a badge, I have shown just a part of them.

I want to create a line plot which would represent the mean of the posts made by users before and after the attainment of the badge(i.e., x-axis should have the days 1 week before and 1 week after attainment of badge and y-axis should have mean number of posts made by users in that duration).

I tried the following code but I am getting AttributeError: 'function' object has no attribute 'line'. Please provide me a way to fix this.

Code example (dataset generation and function):

import pandas as pd
from matplotlib import pyplot as plt

posts = pd.DataFrame({
    'Creation Date': [
        pd.Timestamp('2009-09-28 16:11:38.533'),
        pd.Timestamp('2009-09-28 17:42:23.207'),
        pd.Timestamp('2009-09-28 19:41:13.933'),
        pd.Timestamp('2009-09-28 23:40:55.033')],
    'UserId': [1,2,4,1]
})

badges = pd.DataFrame({
    'UserId': [143, 1, 344],
    'Date': [
        pd.Timestamp('2009-10-17 17:38:32.590'),
        pd.Timestamp('2009-10-19 00:37:23.067'),
        pd.Timestamp('2009-10-20 08:37:14.143')
    ]
})

plt.plot.line(x=(posts['UserId'].CreationDate < badges['UserId'].Date), y=(posts['UserId'].value_counts.mean()))
Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
Ishan Dutta
  • 897
  • 4
  • 16
  • 36
  • Where exactly are you getting the attribute error? Please post the error message. – Itamar Mushkin Jul 16 '20 at 13:06
  • 1
    I have added the full error message in the question. – Ishan Dutta Jul 16 '20 at 13:11
  • Good. Also, I've edited your question to be in the form of a reproducible pandas question. In the future, please follow https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Itamar Mushkin Jul 16 '20 at 13:18
  • The direct answer to your question is that you need to use `.plt.plot(x=..., y=...)` instead of `plt.plot.line(x=..., y=...)`. Though it does not result in working code, because (for example `posts['UserId'].CreationDate` calls an attribute that isn't there; the logic inside the plot needs to be fixed. – Itamar Mushkin Jul 16 '20 at 13:26

1 Answers1

0

These two functions can count the sum of number of times any user posted before and after one particular badge for any user.

def before(user_id):
    count = 0
    for badge_date in badges[badges.UserId==user_id].Date.values:
        count += posts[(posts['Creation Date'] < badge_date) & (posts['UserId'] == user_id)].UserId.count()
    return count

def after(user_id):
    count = 0
    for badge_date in badges[badges.UserId==user_id].Date.values:
        count += posts[(posts['Creation Date'] > badge_date) & (posts['UserId'] == user_id)].UserId.count()
    return count

When applied to 'badges' dataframe :

badges['before']= badges.UserId.apply(before)
badges['after']= badges.UserId.apply(after)

To aggregate the results, you might want to use,

before_df = pd.DataFrame(badges.groupby('UserId').before.sum())
before_df['id'] = before_df.index
before_df = pd.DataFrame(before_df.groupby('before').id.count())
after_df = pd.DataFrame(badges.groupby('UserId').after.sum())
after_df['id'] = after_df.index
after_df = pd.DataFrame(after_df.groupby('after').id.count())

These final before_df and after_df contain the number of times as index and the count of users posted before and after badge respectively as the values.

Does this get the job done?

Sunny
  • 51
  • 1
  • 7