I have the following challenge:
I have a dataframe that looks like this after selecting the column I need from an import:
user_id datetime
1 1473225887
1 1373225887
1 1673225887
2 1173225887
2 1573225887
What I would like to do is two fold: (1) convert the datetime values to a normal date notation, rather than the unix_timestamp, using the datetime function. I have not managed to do this yet.
(2) group the data on user_id, and only keep the first datetime (so the earliest date) of every user_id.
The code that I have written so far is below. Note that I am a beginner in Python, I have not yet managed classes so I'd like to start off without classes.
I hope you can help me out here! Thanks a lot in advance!
def run():
engagement_dataset = import_engagements()
engagement_dataset_first_event = first_engagement(engagement_dataset)
def import_engagements():
df_engagements = pd.read_csv('df_engagements.csv',
sep=';')
required_columns = ['engagement_unix_timestamp', 'user_id']
df_engagements = df_engagements[required_columns]
df_engagements.rename(columns={'engagement_unix_timestamp': 'datetime'}, inplace=True)
return df_engagements
def first_engagement(engagement_dataset):
engagement_dataset_grouped = engagement_dataset.groupby(['user_id'])['datetime'].idxmin().reset_index()
return engagement_dataset_grouped
run()