How to replace values greater than 30 for missing value?

Question

I need to set all the values that are greater than 30 in a dataset as missing values. My dataset is huge, but here is the format of my df:

date	A	B
01/01/2021	20	5
01/02/2021	15	20
01/01/2022	17	30
01/01/2023	30	40

expected result:

date	A	B
01/01/2021	20	5
01/02/2021	15	20
01/01/2022	17	NAN
01/01/2023	NAN	NAN

Would be also nice to count how many of the values are bigger than 30, in this case, 3.

As asked below, I have tried the code:

df= df.apply(lambda x: [y if y <= 30 else NAN for y in x])

Output:

TypeError: '>=' not supported between instances of 'str' and 'int'

he said "example of my df". Can assume that this is a pandas dataframe. Read before downvote. — DevOps, Jan 19 '22 at 15:23
also, this question has already been answered : https://stackoverflow.com/questions/43757977/replacing-values-greater-than-a-number-in-pandas-dataframe. If you can't get to the solution yourself please post part of your code in your question :) — DevOps, Jan 19 '22 at 15:24
@timgeb I get the error "TypeError: '>=' not supported between instances of 'str' and 'int' " — user17717499, Jan 19 '22 at 15:34

score 0 · Answer 1 · answered Jan 19 '22 at 15:33

0

you should try:

df[df[["A", "B"]].astype(int)>=30] = pd.NA

answered Jan 19 '22 at 15:33

Salvatore Daniele Bianco

2,496
1
8
22

In my original dataframe I have a very huge amount of columns, so I cannot write the name of each one in my command. I have tried: df3 [df3 [ [ ] ].astype (int)>=3] = pd.NA which returns " TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value" – user17717499 Jan 19 '22 at 15:43
'columns = df.columns[df.columns!="date"]' and after 'df[df[columns].astype(int)>=30] = pd.NA', but it will work only if all other columns except "date" are numeric columns – Salvatore Daniele Bianco Jan 19 '22 at 16:51
The first line of code worked. Even though all my other columns are numeric, they contain some values non-numeric (NAN), because of that, the second line of code results in "IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer". Any idea on how to get around it? – user17717499 Jan 19 '22 at 20:25
Ok sorry for that it think that in this case you must cast to 'float' instead of 'int'. Try and let me know. Thanks – Salvatore Daniele Bianco Jan 20 '22 at 08:50
I have tried it, but it returns me "ValueError: cannot reindex from a duplicate axis!" . I do not understand what this error means. Do I have duplicated column names? I run " df3.columns.duplicated().any() ", and returned me "True" – user17717499 Jan 20 '22 at 10:05

score 0 · Answer 2 · answered Jan 19 '22 at 15:42

0

You can replace the values in individual columns greater than or equal to 30 with 'NAN', and then count them.

df = pd.DataFrame(d)
df.A[df.A >= 30] = 'NAN'
df.B[df.B >= 30] = 'NAN'
print(df)
print('Values >= 30 in column A =', df.A.value_counts()['NAN'])
print('Values >= 30 in column B =', df.B.value_counts()['NAN'])

answered Jan 19 '22 at 15:42

Aman Sharma

44
3

I have a huge amount of columns, and so I cannot do it like that. – user17717499 Jan 19 '22 at 15:47

How to replace values greater than 30 for missing value?

2 Answers2