0

Thank you in advance for taking the time to help me! (Code provided below) (Data Here)

I am trying to average the first 3 columns and insert it as a new column labeled 'Topsoil'. What is the best way to go about doing that?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
#mean.head()

enter image description here

Xavier Conzet
  • 490
  • 1
  • 5
  • 14

4 Answers4

1

Try this :

mean['avg3col']=mean[['5 cm', '10 cm','15 cm']].mean(axis=1)
Subasri sridhar
  • 809
  • 5
  • 13
0
df['new column'] = (df['col1'] + df['col2'] + df['col3'])/3
callmeanythingyouwant
  • 1,789
  • 4
  • 15
  • 40
0

You could use the apply method in the following way:

mean['Topsoil'] = mean.apply(lambda row: np.mean(row[0:3]), axis=1)

You can read about the apply method in the following link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

The logic is that you perform the same task along a specific axis multiple times.

Note: It is not wise to call data-structures in names of functions, in your case it might be better be mean_df rather the mean

David
  • 8,113
  • 2
  • 17
  • 36
  • 1
    `apply` are loops under the hood, for performance reason is best dont use it if exist vectorized alterantives. – jezrael Sep 30 '20 at 05:14
  • https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – jezrael Sep 30 '20 at 05:17
  • @jezrael Thanks, I didn't know that this is the case. I will try to avoid this issue. – David Sep 30 '20 at 05:19
  • This is prototype for avoid it - arithmetic operations – jezrael Sep 30 '20 at 05:20
  • @jezrael so just to make it clearer, for arithmetic operation `iloc` will always be faster and more appropriate? – David Sep 30 '20 at 05:21
  • No, It means `df.apply(np.mean, axis=1)` is slowier like `df.mean(axis=1)`, iloc here is for selecting only. – jezrael Sep 30 '20 at 05:22
0

Use DataFrame.iloc for select by positions - first 3 columns with mean:

mean['Topsoil'] = mean.iloc[:, :3].mean(axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252