Pandas remove every entry with a specific value

Question

I would like to go through every row (entry) in my df and remove every entry that has the value of " " (which yes is an empty string).

So if my data set is:

Name Gender Age
Jack         5
Anna    F    6
Carl    M    7
Jake    M    7

Therefore Jack would be removed from the dataset.

On another note, I would also like to remove entries that has the value "Unspecified" and "Undetermined" as well.

Eg:

Name Gender Age    Address
Jack         5    *address*
Anna    F    6    *address* 
Carl    M    7   Undetermined
Jake    M    7   Unspecified

Now, Jack will be removed due to empty field. Carl will be removed due to the value Undetermined present in a column. Jake will be removed due to the value Unspecified present in a column.

For now, this has been my approach but I keep getting a TypeError.

list = []
for i in df.columns:
    if df[i] == "":
        # everytime there is an empty string, add 1 to list
        list.append(1)
# count list to see how many entries there are with empty string
len(list)

Please help me with this. I would prefer a for loop being used due to there being about 22 columns and 9000+ rows in my actual dataset.

Note - I do understand that there are other questions asked like this, its just that none of them apply to my situation, meaning that most of them are only useful for a few columns and I do not wish to hardcode all 22 columns.

Edit - Thank you for all your feedbacks, you all have been incredibly helpful.

TheCSGuy · Answer 1 · 2022-09-30T10:21:45.797

1

To delete a row based on a condition use the following:

df = df.drop(df[condition].index)

For example: df = df.drop(df[Age==5].index) , will drop the row where the Age is 5.

I've come across a post regarding the same dating back to 2017, it should help you understand it more clearer.

edited Sep 30 '22 at 10:21

answered Sep 30 '22 at 10:16

TheCSGuy

61
5

score 1 · Answer 2 · answered Sep 30 '22 at 10:22

1

Regarding question 2, here's how to remove rows with the specified values in a given column:

df = df[~df["Address"].isin(("Undetermined", "Unspecified"))]

answered Sep 30 '22 at 10:22

Bence

46
4

Rabinzel · Answer 3 · 2022-09-30T10:34:29.830

1

You can build masks and then filter the df according to it:

m1 = df.eq('').any(axis=1) 
# m1 is True if any cell in a row has an empty string

m2 = df['Address'].isin(['Undetermined', 'Unspecified'])
# m2 is True if a row has one of the values in the list in column 'Address'

out = df[~m1 & ~m2] # invert both condition and get the desired output
print(out)

Output:

   Name Gender  Age    Address
1  Anna      F    6  *address*

Used Input:

df = pd.DataFrame({'Name': ['Jack', 'Anna', 'Carl', 'Jake'],
 'Gender': ['', 'F', 'M', 'M'],
 'Age': [5, 6, 7, 7],
 'Address': ['*address*', '*address*', 'Undetermined', 'Unspecified']}
)

edited Sep 30 '22 at 10:34

answered Sep 30 '22 at 10:25

Rabinzel

7,757
3
10
30

Thank you for this masking technique. I wasn't taught this in class but it's pretty interesting! I'll learn more on this. And thank you for your response. – Hakari Kinji Sep 30 '22 at 11:37
If you want to read more on this and how it works, here is the section of pandas User Guide on [Boolean Indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing) with different examples. – Rabinzel Sep 30 '22 at 13:00

score 1 · Answer 4 · answered Sep 30 '22 at 10:25

1

Let's assume we have a Pandas DataFrame object df.

To remove every row given your conditions, simply do:

df = df[df.Gender == " " or df.df.Age == " " or df.Address in [" ", "Undetermined", "Unspecified"]]

If the unspecified fields are NaN, you can also do:

df = df.dropna(how="any", axis = 0)

answered Sep 30 '22 at 10:25

Piotr Krzemiński

23
4

Thank you for showing me this. I had no clue how to drop NaN values from every column. – Hakari Kinji Sep 30 '22 at 11:39

score 1 · Answer 5 · answered Sep 30 '22 at 10:25

Answer from @ThatCSFresher or @Bence will help you out in removing rows based on single column... Which is great!

However, I think there are multiple condition in your query needed to check across multiple columns at once in a loop. So, probably apply-lambda can do the job; Try the following code;

df = pd.DataFrame({"Name":["Jack","Anna","Carl","Jake"],
                   "Gender":["","F","M","M"],
                   "Age":[5,6,7,7],
                   "Address":["address","address","Undetermined","Unspecified"]})

df["Noise_Tag"] = df.apply(lambda x: "Noise" if ("" in list(x)) or ("Undetermined" in list(x)) or ("Unspecified" in list(x)) else "No Noise",axis=1)
df1 = df[df["Noise_Tag"] == "No Noise"]
del df1["Noise_Tag"]

# Output of df;
    Name    Gender  Age Address   Noise_Tag
0   Jack        5   address       Noise
1   Anna    F   6   address       No Noise
2   Carl    M   7   Undetermined  Noise
3   Jake    M   7   Unspecified   Noise

# Output of df1;
    Name    Gender  Age Address
1   Anna    F   6   address

Glad to Help @JeffThura ... Drop a like / accept the best answer that works for your problem statement... To grow & motivate community... Happy Coding :) — Sachin Kohli, Sep 30 '22 at 11:51

score 1 · Answer 6 · answered Sep 30 '22 at 10:34

Well, OP actually wants to delete any column with "empty" string.

df = df[~(df=="").any(axis=1)] # deletes all rows that have empty string in any column.

If you want to delete specifically for address column, then you can just delete using

df = df[~df["Address"].isin(("Undetermined", "Unspecified"))]

Or if any column with Undetermined or Unspecified, try similar as the first solution in my post, just by replacing the empty string with Undertermined or Unspecified.

df = df[~((df=="Undetermined") | (df=="Unspecified")).any(axis=1)]

R. Baraiya · Answer 7 · 2022-09-30T10:36:56.410

0

using lambda fun

Code:

df[df.apply(lambda x: False if (x.Address in ['Undetermined', 'Unspecified'] or '' in list(x)) else True, axis=1)]

Output:

    Name    Gender  Age Address
1   Anna    F       6   *add

edited Sep 30 '22 at 10:36

answered Sep 30 '22 at 10:29

R. Baraiya

1,490
1
4
17

Thanks for this lambda method, its pretty hard to understand but seems effective. Thank you for your response, I will look more into this method. – Hakari Kinji Sep 30 '22 at 11:38

Pandas remove every entry with a specific value

7 Answers7