Weighted average per day from inconsistent seconds data for each id in pandas data frame

Question

I have a data frame which is indexed by with inconsistent datetime objects. I have seen similar examples where values can be averaged per day, but not the average per day for each id. I could create a new data frame for each selection_id but i assume there is a better way that i just cant find online.

In the data frame i have is:

                                             | selection_id  | price           |
                                             | ------------- | --------------- |
                    |2023-05-13 05:57:07.554 | 1             | 1.50            |
                    |2023-05-13 06:08:59.193 | 1             | 1.56            |
                    |2023-05-13 06:08:59.085 | 1             | 1.61            |
                    |2023-05-13 06:08:59.085 | 1             | 1.50            |
                    |2023-05-13 06:08:59.085 | 1             | 1.51            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 05:57:07.554 | 45            | 3.16            |
                    |2023-05-13 06:08:59.193 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 45            | 3.16            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 98            | 7.05            |
                    |2023-05-13 06:08:59.085 | 98            | 7.52            |
                    |2023-05-13 05:57:07.554 | 98            | 7.11            |
                    |2023-05-13 06:08:59.193 | 98            | 7.99            |
                    |2023-05-13 06:08:59.085 | 98            | 7.50            |
                    |2023-05-13 06:08:59.085 | 98            | 7.20            |
                    |2023-05-13 06:08:59.085 | 98            | 7.65            |
                    |2023-05-13 06:08:59.085 | 98            | 7.45            |
                    |2023-05-14 05:57:07.554 | 1             | 2.50            |
                    |2023-05-14 06:08:59.193 | 1             | 2.56            |
                    |2023-05-14 06:08:59.085 | 1             | 2.61            |
                    |2023-05-14 06:08:59.085 | 1             | 2.50            |
                    |2023-05-14 06:08:59.085 | 1             | 2.51            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 05:57:07.554 | 45            | 2.16            |
                    |2023-05-14 06:08:59.193 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 45            | 2.16            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 98            | 7.05            |
                    |2023-05-14 06:08:59.085 | 98            | 7.52            |
                    |2023-05-14 05:57:07.554 | 98            | 7.11            |
                    |2023-05-14 06:08:59.193 | 98            | 7.99            |
                    |2023-05-14 06:08:59.085 | 98            | 7.50            |
                    |2023-05-14 06:08:59.085 | 98            | 7.20            |
                    |2023-05-14 06:08:59.085 | 98            | 7.65            |
                    |2023-05-14 06:08:59.085 | 98            | 7.45            |

And i want to summerise the data frame to:

                                             | selection_id   | price            |
                                             | -------------- | ---------------- |
                    |2023-05-13              | 1              | 1.536            |
                    |2023-05-13              | 45             | 3.136            |
                    |2023-05-13              | 98             | 7.434            |
                    |2023-05-14              | 1              | 2.536            |
                    |2023-05-14              | 45             | 2.136            |
                    |2023-05-14              | 98             | 7.434            |

TanjiroLL · Answer 1 · 2023-05-19T12:06:13.700

1

To obtain the date without the time, you can utilize dt.date. Then use groupby to group the data by date and selection id, after that, apply agg to compute the mean for each group.:

df = pd.DataFrame(data)

df['datetime'] = pd.to_datetime(df['datetime'])

summary_df = df.groupby([df['datetime'].dt.date, 'selection_id']).agg({'price': 'mean'}).reset_index()

summary_df.head()

edited May 19 '23 at 12:06

answered May 19 '23 at 11:58

TanjiroLL

1,354
1
5
5

score 1 · Accepted Answer · answered May 19 '23 at 12:01

You can use :

out = (
    df.groupby([pd.to_datetime(df.index).date, #or df.index.date if it's a DatetimeIndex
                "selection_id"])["price"].mean().reset_index(level=1)
      )

Output :

print(out)

           selection_id  price
2023-05-13            1   1.54
2023-05-13           45   3.14
2023-05-13           98   7.43
2023-05-14            1   2.54
2023-05-14           45   2.14
2023-05-14           98   7.43

Weighted average per day from inconsistent seconds data for each id in pandas data frame

2 Answers2