Creating a new column in Pandas

Question

Thank you in advance for taking the time to help me! (Code provided below) (Data Here)

I am trying to average the first 3 columns and insert it as a new column labeled 'Topsoil'. What is the best way to go about doing that?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
#mean.head()

score 1 · Answer 1 · answered Sep 30 '20 at 05:09

1

Try this :

mean['avg3col']=mean[['5 cm', '10 cm','15 cm']].mean(axis=1)

answered Sep 30 '20 at 05:09

Subasri sridhar

809
5
13

score 0 · Answer 2 · answered Sep 30 '20 at 05:03

0

df['new column'] = (df['col1'] + df['col2'] + df['col3'])/3

answered Sep 30 '20 at 05:03

callmeanythingyouwant

1,789
4
15
40

score 0 · Answer 3 · answered Sep 30 '20 at 05:07

0

You could use the apply method in the following way:

mean['Topsoil'] = mean.apply(lambda row: np.mean(row[0:3]), axis=1)

You can read about the apply method in the following link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

The logic is that you perform the same task along a specific axis multiple times.

Note: It is not wise to call data-structures in names of functions, in your case it might be better be mean_df rather the mean

answered Sep 30 '20 at 05:07

David

8,113
2
17
36

1

`apply` are loops under the hood, for performance reason is best dont use it if exist vectorized alterantives. – jezrael Sep 30 '20 at 05:14
https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – jezrael Sep 30 '20 at 05:17
@jezrael Thanks, I didn't know that this is the case. I will try to avoid this issue. – David Sep 30 '20 at 05:19
This is prototype for avoid it - arithmetic operations – jezrael Sep 30 '20 at 05:20
@jezrael so just to make it clearer, for arithmetic operation `iloc` will always be faster and more appropriate? – David Sep 30 '20 at 05:21
No, It means `df.apply(np.mean, axis=1)` is slowier like `df.mean(axis=1)`, iloc here is for selecting only. – jezrael Sep 30 '20 at 05:22

jezrael · Answer 4 · 2020-09-30T05:20:13.277

0

Use DataFrame.iloc for select by positions - first 3 columns with mean:

mean['Topsoil'] = mean.iloc[:, :3].mean(axis=1)

edited Sep 30 '20 at 05:20

answered Sep 30 '20 at 05:10

jezrael

822,522
95
1,334
1,252

Creating a new column in Pandas

4 Answers4