If a special character exists, then display?

Question

I'm learning python. I'm trying to identify rows of data where the string value includes a special character.

import pandas as pd
cn = pd.read_excel(f"../Files/df.xlsx", sheet_name='Values')
cn = cn[['DestinationName']]
special_characters = "!@#$%^&*()-+?_=,<>/"

cn['Special Characters'] = ["Y" if any(c in special_characters for c in cn) else "N"]

Basically, I'd like to either only display rows that include any of the special characters, or create a separate column to show whether Yes (it includes a special character) or No. For example, Red & Blue has the "&" character so it should be flagged as Yes, while RedBlue shouldn't.

I'm a little stuck, and any help would be appreciated

Did you check [this post](https://stackoverflow.com/questions/11350770/filter-pandas-dataframe-by-substring-criteria)? — Ignatius Reilly, Oct 14 '22 at 19:47

ewenmichel · Accepted Answer · 2022-10-14T20:10:13.573

I would recommend using sets on this specific task :

Creating a set of your list of special characters
Create a new column, which contains the following boolean : "the intersection of special_characters and the string of column "Destination Name" is non empty"

It should look like this:

special_characters_set = set(list(special_characters))
cn["Special Characters"] = cn["DestinationName"].apply(lambda x : len(set(list(x)).intersect(special_characters_set)) != 0)

Where

# list('hello') = ['h', 'e', 'l', 'l', 'o'] # ordered and repetitions
# set(list('hello')) = {'h', 'e', 'l', 'o'} # non ordered and no repetitions

Keep in mind that the .apply() method is not really the most computationally efficient to manipulate dataframes.

If a special character exists, then display?

1 Answers1