0

I'm trying to group my df by a column named "id", my code is working my only problem is that my grouped_df return number (e.g., 43248 or 43249) in the "created_date" and in the "date" column instead of the actual original date. This is my code:

unique_df['created_date'] = pd.to_datetime(unique_df['created_date'], format='%Y-%m-%d')
unique_df['date'] = pd.to_datetime(unique_df['date'], format='%Y-%m-%d')


aggregations = {
    'username': 'first',
    'created_date': 'first',
    'id': 'count',
    'date': ['first', 'last'],
    'like_count': 'sum',
    'list_count': ['first', 'last']   

}

grouped_df = unique_df.groupby('id').agg(aggregations)

grouped_df['created_date'] = pd.to_datetime(grouped_df['created_date'], format='%Y-%m-%d')
grouped_df['date_first'] = pd.to_datetime(grouped_df['date_first'], format='%Y-%m-%d')
grouped_df['date_last'] = pd.to_datetime(grouped_df['date_last'], format='%Y-%m-%d')

How can I keep the original date format and avoid having number instead of date?

aetosti
  • 128
  • 6
  • So need convert excel dates to pandas datetimes? Like in linked answers? – jezrael Feb 23 '23 at 10:14
  • @jezrael that shouldn't be needed at all, `read_excel` returns Python dates from Excel date values. Trying to *fix* dates is what causes problems. – Panagiotis Kanavos Feb 23 '23 at 10:19
  • @aetosti where did the data come from? Excel supports dates natively so there's no need for `pd.to_datetime(unique_df['created_date'], format='%Y-%m-%d')`. In fact, this will *cause* problems if `unique_df['created_date']` is already a `datetime`. The raw values in Excel are an offset from `1900-01-01`, which is why dates can appear as `43248` if the wrong numeric format style is used – Panagiotis Kanavos Feb 23 '23 at 10:21
  • The one thing that *shouldn't* be done, is trying any of the non-solutions in the linked question. It's better to load the data properly than trying to reverse the problem after the fact. A correct duplicate exists, [Pandas Read_Excel Datetime Converter](https://stackoverflow.com/questions/42958217/pandas-read-excel-datetime-converter) which shows what to do when Pandas or rather openpyxl, fails to detect a date column – Panagiotis Kanavos Feb 23 '23 at 10:25
  • `read_excel` or rather `openpyxl` may fail to recognize date fields. This can happen in Excel files generated by applications and reporting engines. It can also happen if users try to force a specific format by converting dates to strings. In that case you can specify the column's type through `parse_dates`, `dtype` or use a specific conversion through `converters` – Panagiotis Kanavos Feb 23 '23 at 10:30

0 Answers0