Iterating over rows to find mean of a data frame in Python

Question

I have a dataframe of 100 random numbers and I would like to find the mean as follows:

mean0 should have mean of 0,5,10,... rows

mean1 should have mean of 1,6,11,16,.... rows

.

. mean4 should have mean of 4,9,14,... rows.

So far, I am able to find the mean0 but I am not able to figure out a way to iterate the process in order to obtain the remaining means.

My code is as follows:

import numpy as np
import pandas as pd
import csv

data = np.random.randint(1, 100, size=100)
df = pd.DataFrame(data)

print(df)

df.to_csv('example.csv', index=False)

df1 = df[::5]
print("Every 12th row is:\n",df1)

df2 = df1.mean()
print(df2)

Does this answer your question? [Groupby certain number of rows pandas](https://stackoverflow.com/questions/44035640/groupby-certain-number-of-rows-pandas) — Ignatius Reilly, Jul 08 '22 at 17:42

7shoe · Answer 1 · 2022-07-08T17:53:13.237

2

Since df[::5] is equivalent to df[0::5], you could use df[1::5], df[2::5], df[3::5], and df[4::5] for the remaining dataframes with subsequent application of mean by df[i::5].mean().

It is not explicitly showcased in the Pandas documentation examples but identical list slicing with [start:stop:step].

edited Jul 08 '22 at 17:53

answered Jul 08 '22 at 17:46

7shoe

1,438
1
8
12

score 1 · Accepted Answer · answered Jul 08 '22 at 18:02

1

I would use the underlying numpy array:

df[0].to_numpy().reshape(-1, 5).mean(0)

output: array([40.8 , 52.75, 43.2 , 55.05, 47.45])

answered Jul 08 '22 at 18:02

mozway

194,879
13
39
75

This gives me an error: raise KeyError(key) from err KeyError: 0 I don't understand why. Can you please help me? – A. Gehani Jul 08 '22 at 18:26
Obviously, you should use the name of your column. In the provided example this was `0`. If the name is `'col'` -> `df['col']...` – mozway Jul 08 '22 at 18:44
When I tried to run the command with the code given above, it gives me the desired results but when I used the same command with df= pd.read_csv('example.csv'), it gives me the error. I haven't changed the name of the column in both the cases. Can you help me to understand what is wrong here? – A. Gehani Jul 11 '22 at 23:53
Use `print(df.columns)` to see the column names. If you have `0` use `df[0]`, for `'0'` use `df['0']`. If you have a single column you can use positional indexing: `df.iloc[:,0]`. This is all basic pandas and had little to do with how my solution works. The pre-requisite is that you are able to select your column, and that the column contains a multiple of 5 elements. – mozway Jul 12 '22 at 04:45
1

Thanks! I renamed the column and it worked perfectly for me. Thank you once again. – A. Gehani Jul 12 '22 at 05:59

Iterating over rows to find mean of a data frame in Python

2 Answers2