Python Pandas Datetime Parser, Group and Find First Value

Question

I have the following challenge:

I have a dataframe that looks like this after selecting the column I need from an import:

user_id    datetime
1          1473225887
1          1373225887
1          1673225887
2          1173225887
2          1573225887

What I would like to do is two fold: (1) convert the datetime values to a normal date notation, rather than the unix_timestamp, using the datetime function. I have not managed to do this yet.

(2) group the data on user_id, and only keep the first datetime (so the earliest date) of every user_id.

The code that I have written so far is below. Note that I am a beginner in Python, I have not yet managed classes so I'd like to start off without classes.

I hope you can help me out here! Thanks a lot in advance!

def run():
    engagement_dataset = import_engagements()
    engagement_dataset_first_event = first_engagement(engagement_dataset)

def import_engagements():
    df_engagements = pd.read_csv('df_engagements.csv',
                                 sep=';')
    required_columns = ['engagement_unix_timestamp', 'user_id']
    df_engagements = df_engagements[required_columns]
    df_engagements.rename(columns={'engagement_unix_timestamp': 'datetime'}, inplace=True)
    return df_engagements

def first_engagement(engagement_dataset):
    engagement_dataset_grouped = engagement_dataset.groupby(['user_id'])['datetime'].idxmin().reset_index()
        return engagement_dataset_grouped

run()

[Here](https://stackoverflow.com/questions/19801727/convert-datetime-to-unix-timestamp-and-convert-it-back-in-python) is an answer that discusses unix datetime conversions, and for the second part, you should be able to use `groupby().min()` rather than `idxmin`to get you started — G. Anderson, Oct 25 '18 at 15:36

Franco Piccolo · Accepted Answer · 2018-10-25T16:06:40.217

1

(1) You can convert a unix formatted datetime with:

df['datetime_formatted'] = pd.to_datetime(df['datetime'], unit='s')

(2) Then you can group by user and aggregate via agg finding the minimum date for that user:

df.groupby('user_id').agg({'datetime_formatted':'min'})

edited Oct 25 '18 at 16:06

answered Oct 25 '18 at 15:42

Franco Piccolo

6,845
8
34
52

Please add explanations to your answer. – Matthieu Brucher Oct 25 '18 at 16:04

Python Pandas Datetime Parser, Group and Find First Value

1 Answers1