How to replace values greater than specific value in dataframe column?

Question

I have a dataset with some outlier in the age field here is the unique values of my data sorted

unique = df_csv['AGE'].unique()
print (sorted(unique))

[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 79, 126, 140, 149, 152, 228, 235, 267]

How can I replace any value greater than 80 with the mean or median of my Age column?

score 4 · Accepted Answer · answered Nov 21 '20 at 00:05

4

Since you want to work with a column in a dataframe, you should resolve to loc:

 # replace `median` with `mean` if you want
 df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()

answered Nov 21 '20 at 00:05

Quang Hoang

146,074
10
56
74

score 1 · Answer 2 · answered Nov 20 '20 at 23:59

1

You could do:

series[series > 80] = series.median()
print(series)

Output

0     21
1     22
2     23
3     24
4     25
      ..
58    52
59    52
60    52
61    52
62    52
Length: 63, dtype: int64

answered Nov 20 '20 at 23:59

Dani Mesejo

61,499
6
49
76

ombk · Answer 3 · 2020-11-21T01:44:25.963

0

median = df_csv['AGE'].median()
# using apply 
df_csv['AGE'].apply(lambda x: median if x>80 else x)

Other method: Here

edited Nov 21 '20 at 01:44

answered Nov 20 '20 at 23:59

ombk

2,036
1
4
16

To explain what apply does : lambda is a function without a name, that you could assign to it any function (similar to def ... but easier to use). lambda x, means select the value from the dataframe. then after the semi colon you have the condition: median if x>80, else keep x the same it goes over every row and does this check – ombk Nov 21 '20 at 01:47

How to replace values greater than specific value in dataframe column?

3 Answers3