-1

I'm looking for a way to quickly and effectively filter through a dataframe column and remove values that don't meet a condition.

Say, I have a column with the numbers 4, 5 and 10. I want to filter the column and replace any numbers above 7 with 0. How would I go about this?

Quantitative
  • 51
  • 2
  • 6

2 Answers2

1

You're talking about two separate things - filtering and value replacement. They both have uses and end up being similar in nature but for filtering I'll point to this great answer.

Let's say our data frame is called df and looks like

   A   B
1  4   10
2  4   2
3  10   1
4  5   9
5  10  3

Column A fits your statement of a column only having values 4, 5, 10. If you wanted to replace numbers above 7 with 0, this would do it:

df["A"] = [0 if x > 7 else x for x in df["A"]]

If you read through the right-hand side it cleanly explains what it is doing. It helps to include parentheses to separate out the "what to do" with the "what you're doing it over":

df["A"] = [(0 if x > 7 else x) for x in df["A"]]

If you want to do a manipulation over multiple columns, then utilizing zip allows you to do it easily. For example, if you want the sum of columns A and B then:

df["sum"] = [x[0] + x[1] for x in zip(df["A"], df["B"])]

Take care when you overwrite data - this removes information. It's a good practice to have the transformed data in other columns so you can trace back when something inevitably goes wonky.

Paul Raff
  • 93
  • 5
0

There is many options. One possibility for if then... is np.where

import pandas as pd
import numpy as np

df = pd.DataFrame({'x': [1, 200, 4, 5, 6, 11],
                'y': [4, 5, 10, 24, 4 , 3]})
df['y'] = np.where(df['y'] > 7, 0, df['y'])
PalimPalim
  • 2,892
  • 1
  • 18
  • 40