0

I'm learning python. I'm trying to identify rows of data where the string value includes a special character.

import pandas as pd
cn = pd.read_excel(f"../Files/df.xlsx", sheet_name='Values')
cn = cn[['DestinationName']]
special_characters = "!@#$%^&*()-+?_=,<>/"

cn['Special Characters'] = ["Y" if any(c in special_characters for c in cn) else "N"]

Basically, I'd like to either only display rows that include any of the special characters, or create a separate column to show whether Yes (it includes a special character) or No. For example, Red & Blue has the "&" character so it should be flagged as Yes, while RedBlue shouldn't.

I'm a little stuck, and any help would be appreciated

rulans
  • 23
  • 4

1 Answers1

0

I would recommend using sets on this specific task :

  1. Creating a set of your list of special characters
  2. Create a new column, which contains the following boolean : "the intersection of special_characters and the string of column "Destination Name" is non empty"

It should look like this:

special_characters_set = set(list(special_characters))
cn["Special Characters"] = cn["DestinationName"].apply(lambda x : len(set(list(x)).intersect(special_characters_set)) != 0)

Where

# list('hello') = ['h', 'e', 'l', 'l', 'o'] # ordered and repetitions
# set(list('hello')) = {'h', 'e', 'l', 'o'} # non ordered and no repetitions

Keep in mind that the .apply() method is not really the most computationally efficient to manipulate dataframes.

ewenmichel
  • 16
  • 2