0

I have a df that looks like this:

measurement_3      329
measurement_4      409
measurement_5      508
measurement_6      624
measurement_7      720
measurement_8      846
measurement_9      904
measurement_10    1067
measurement_11    1136
measurement_12    1240
measurement_13    1303
measurement_14    1440
measurement_15    1542
measurement_16    1678
measurement_17    1740

How do I iterate through the column names so that I can get the mean of each column without manually doing it?

I have done the following, but would like a more optimised solution using a for loop:

# Mean
dataset['loading'].fillna(dataset['loading'].mean(), inplace=True)

# Mean + std

dataset['measurement_3'].fillna(dataset['measurement_3'].mean() + dataset['measurement_3'].std(), inplace=True)
dataset['measurement_4'].fillna(dataset['measurement_4'].mean() + dataset['measurement_4'].std(), inplace=True)
dataset['measurement_5'].fillna(dataset['measurement_5'].mean() + dataset['measurement_5'].std(), inplace=True)
# continues to measurement_17
zampoan
  • 57
  • 7
  • 2
    Try `df.mean(axis=0)` , `axis=1` argument calculates mean across the row and `axis=0` calculates mean across the column. So you should try `axis=0` – Shisui Otsutsuki Sep 12 '22 at 04:49
  • For more clarity: a - Axis 0 will act on all the ROWS in each COLUMN b - Axis 1 will act on all the COLUMNS in each ROW – Shisui Otsutsuki Sep 12 '22 at 04:54
  • Also, [don't use `inplace=True`](https://stackoverflow.com/a/59242208/11659881). – Kraigolas Sep 12 '22 at 04:55
  • 2
    A for loop is basically never the best option when it comes to pandas. Using one usually means you're throwing out the point of using pandas in the first place... for loops are NOT faster than vectorized functions. – BeRT2me Sep 12 '22 at 05:09

2 Answers2

2

Please say no to looping...

measures = df.filter(like='measurement')
df[measures.columns] = measures.fillna(measures.mean() + measures.std())
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
1
for i in range(3,18):
    dataset[f'measurement_{i}'].fillna(dataset[f'measurement_{i}'].mean() + dataset[f'measurement_{i}'].std(), inplace=True)

use for loop with f-string format to access data frame column using name

Deven Ramani
  • 751
  • 4
  • 10