0

I have a data frame which is indexed by with inconsistent datetime objects. I have seen similar examples where values can be averaged per day, but not the average per day for each id. I could create a new data frame for each selection_id but i assume there is a better way that i just cant find online.

In the data frame i have is:

                                             | selection_id  | price           |
                                             | ------------- | --------------- |
                    |2023-05-13 05:57:07.554 | 1             | 1.50            |
                    |2023-05-13 06:08:59.193 | 1             | 1.56            |
                    |2023-05-13 06:08:59.085 | 1             | 1.61            |
                    |2023-05-13 06:08:59.085 | 1             | 1.50            |
                    |2023-05-13 06:08:59.085 | 1             | 1.51            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 05:57:07.554 | 45            | 3.16            |
                    |2023-05-13 06:08:59.193 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 45            | 3.16            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 98            | 7.05            |
                    |2023-05-13 06:08:59.085 | 98            | 7.52            |
                    |2023-05-13 05:57:07.554 | 98            | 7.11            |
                    |2023-05-13 06:08:59.193 | 98            | 7.99            |
                    |2023-05-13 06:08:59.085 | 98            | 7.50            |
                    |2023-05-13 06:08:59.085 | 98            | 7.20            |
                    |2023-05-13 06:08:59.085 | 98            | 7.65            |
                    |2023-05-13 06:08:59.085 | 98            | 7.45            |
                    |2023-05-14 05:57:07.554 | 1             | 2.50            |
                    |2023-05-14 06:08:59.193 | 1             | 2.56            |
                    |2023-05-14 06:08:59.085 | 1             | 2.61            |
                    |2023-05-14 06:08:59.085 | 1             | 2.50            |
                    |2023-05-14 06:08:59.085 | 1             | 2.51            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 05:57:07.554 | 45            | 2.16            |
                    |2023-05-14 06:08:59.193 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 45            | 2.16            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 98            | 7.05            |
                    |2023-05-14 06:08:59.085 | 98            | 7.52            |
                    |2023-05-14 05:57:07.554 | 98            | 7.11            |
                    |2023-05-14 06:08:59.193 | 98            | 7.99            |
                    |2023-05-14 06:08:59.085 | 98            | 7.50            |
                    |2023-05-14 06:08:59.085 | 98            | 7.20            |
                    |2023-05-14 06:08:59.085 | 98            | 7.65            |
                    |2023-05-14 06:08:59.085 | 98            | 7.45            |

And i want to summerise the data frame to:

                                             | selection_id   | price            |
                                             | -------------- | ---------------- |
                    |2023-05-13              | 1              | 1.536            |
                    |2023-05-13              | 45             | 3.136            |
                    |2023-05-13              | 98             | 7.434            |
                    |2023-05-14              | 1              | 2.536            |
                    |2023-05-14              | 45             | 2.136            |
                    |2023-05-14              | 98             | 7.434            |
user3560858
  • 55
  • 1
  • 7

2 Answers2

1

To obtain the date without the time, you can utilize dt.date. Then use groupby to group the data by date and selection id, after that, apply agg to compute the mean for each group.:

df = pd.DataFrame(data)

df['datetime'] = pd.to_datetime(df['datetime'])

summary_df = df.groupby([df['datetime'].dt.date, 'selection_id']).agg({'price': 'mean'}).reset_index()

summary_df.head()
TanjiroLL
  • 1,354
  • 1
  • 5
  • 5
1

You can use :

out = (
    df.groupby([pd.to_datetime(df.index).date, #or df.index.date if it's a DatetimeIndex
                "selection_id"])["price"].mean().reset_index(level=1)
      )

Output :

print(out)

           selection_id  price
2023-05-13            1   1.54
2023-05-13           45   3.14
2023-05-13           98   7.43
2023-05-14            1   2.54
2023-05-14           45   2.14
2023-05-14           98   7.43
Timeless
  • 22,580
  • 4
  • 12
  • 30