Pandas: How to set too large values in columns (bad data) to zero, should I use an if function or something completely different?

Question

Pandas dataframe "power" has datetime as index. Columns are Ap1, Ap2, Ap3 and Solar that have float64 values. However, some of the data is bad and I want to replace all values over a certain value (e.g. 100 000) with zero. Here's how the dataframe looks:

power.head()
power.describe()

                    Ap1     Ap2     Ap3     Solar
Datetime                
2018-01-01 00:00:00 659.18  59.51   120.39  0.0
2018-01-01 00:01:00 600.59  119.93  179.90  0.0
2018-01-01 00:02:00 600.59  119.93  119.93  0.0
2018-01-01 00:03:00 534.67  119.93  59.97   0.0
2018-01-01 00:04:00 600.59  119.93  119.93  0.0


    Ap1             Ap2             Ap3             Solar
max 6.489067e+06    1.167420e+06    2.296201e+06    52433.040000

I'm trying to go through the columns with an if function that would replace the large values with a zero:

def badvalue(x):
    if x > 100000:
        x == 0

power["Ap1"].apply(badvalue)

However, this does nothing to the data, and I understand you probably can't change the values this way anyway (I wish Pandas was this intuitive though!). So what is the easiest/best way to do this with Pandas?

And if I wanted to do this for all columns at the same time instead of just one column, would the method be something different?

Thank you for your help.

The problem with your code is that you don't assign it to anything. This way your code should work: `power["Ap1"] = power["Ap1"].apply(badvalue)` But this can be done much more efficient in pandas: `power.loc[power["Ap1"] > 10000, "Ap1"] = 0` — Niels Henkens, Dec 11 '18 at 11:20
Thank you, the latter example did exactly what was needed. Just out of curiosity, do you know why the first example in your comment changed EVERY value in the column to "None"? — KMFR, Dec 11 '18 at 12:20
Your function badvalue doesn't `return` anything (I didn't notice that before). If you add `return x` to the bottom of your function, it probably does work as expected. — Niels Henkens, Dec 11 '18 at 12:38
I tried with return x earlier too but it still didn't get it right. But it doesn't matter, I'll do it with .loc from now on, thanks again. — KMFR, Dec 11 '18 at 14:13

score 1 · Answer 1 · answered Dec 11 '18 at 11:14

1

Use:

power.Ap1[power.Ap1 > 10000] = 0

Similarly for other columns.

answered Dec 11 '18 at 11:14

meW

3,832
7
27

using df.loc[] is prefered for changing values. – Niels Henkens Dec 11 '18 at 11:21
Thanks. Can you direct me to such comparative source :) – meW Dec 11 '18 at 11:22
Just see the warning you get when you run the code: `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy '''` – Niels Henkens Dec 11 '18 at 11:33
1

I pasted the answer after running the code with no warning. Anyways, I too appreciate `loc` :) – meW Dec 11 '18 at 11:37
1

You're right, my bad. I did a different thing. – Niels Henkens Dec 11 '18 at 11:55

Pandas: How to set too large values in columns (bad data) to zero, should I use an if function or something completely different?

1 Answers1