Calculate the average of several columns and insert it in a new column

Question

I have a pandas dataframe looking like this:

So right now, I have 96 time column (per day). Instead of having 96 columns (every 15 minutes), I want to have 24 columns (every 1 hour). Therefore I want to calculate the average of 00:00 00:15 00:30 00:45 for every row and put it into a new column 00:00. (and of course the same for the other 23 hours). Can anybody help me out?

Does this answer your question? [Pandas: sum of every N columns](https://stackoverflow.com/questions/46607027/pandas-sum-of-every-n-columns) — Matt Hall, Aug 22 '23 at 11:50
Try looking at [How to Calculate the Average of Selected Columns in Pandas](https://www.statology.org/pandas-average-selected-columns/). — Alias Cartellano, Aug 22 '23 at 18:00
[Please don't post pictures of text](//meta.stackoverflow.com/q/285551/4518341). Instead, copy the text itself, [edit] it into your post, and use the formatting tools like [code formatting](/editing-help#code). Ideally, provide a [mre] including complete example data and desired output (e.g. give us, say, 2 hours worth of 2 rows and the result you'd want). For specifics: [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Aug 24 '23 at 02:53
What do you need help with exactly? Like, what have you tried, and where are you stuck? Are you aware you can `groupby` on columns? Maybe resample would work too? — wjandrea, Aug 24 '23 at 02:55

score 1 · Answer 1 · answered Aug 24 '23 at 02:50

I don't have a data frame constructor so I can't test this code. But you would want to take your current data frame and stack it. That moves the time into the left. Then, within row groups, use df.resample while passing a 1 hour time string.

This might look something like this:

# df.columns = pd.to_datetime(df.columns)  # if needed
ndf = df.stack() \
    .groupby(level=0).resample(level=1, rule='H') \
    .mean()

This would convert a sample data frame like this:

>>> df = pd.DataFrame({
...     '00:15': [1, 2],
...     '00:30': [3, 4],
...     '00:45': [5, 6],
...     '01:00': [7, 8],
... })
>>> df
   00:15  00:30  00:45  01:00
0      1      3      5      7
1      2      4      6      8

Into:

>>> df.stack().groupby(level=0).resample(level=1, rule='H').mean()
0  2023-08-23 00:00:00    3.0
   2023-08-23 01:00:00    7.0
1  2023-08-23 00:00:00    4.0
   2023-08-23 01:00:00    8.0
dtype: float64

I would not store my data in this format where the time is the column. It isn't Tidy.

This solution keeps the rows as individual units of observations with the groupby. You might want also to clean up the time. Pandas will store the time as date-time only. I think that issue is outside scope.

score 0 · Answer 2 · answered Aug 25 '23 at 06:37

Sorry, I had forgotten to look back in here, thanks for your help guys. My solution looks like this now:

time_columns = [f"{str(i).zfill(2)}:{str(j).zfill(2)}" for i in range(24) for j in range(0, 60, 15)]

df_copy = newdf.copy()

include_columns = [col for col in newdf.columns if col in time_columns]
filtered_df = df_copy[include_columns]

hours = [col.split(':')[0] for col in filtered_df.columns]
filtered_df.columns = hours

hourly_df = filtered_df.groupby(filtered_df.columns, axis=1).mean()

hourly_df.columns = [str(i).zfill(2) + ':00' for i in range(24)]

Calculate the average of several columns and insert it in a new column

2 Answers2