For the following df (please note that the df I am working with is read in raw data imported from a txt file and not the below df created in python for this example)
import pandas as pd
df = pd.DataFrame({'ID': ['12374' ,'19352','21014','2619','2621','9566','9686','61319','68086','69239','69353', '69373','69491','69535','69582','69691','174572','174637','174646','175286','175390'],
'Category': [' ', ' ', ' ', '???? ?????','? ?',' ','?? ?',' ',' ',' ','?? ?',' ','? ?','???? ????? ??? ','? ?','?? ?','A','A','B','B','C']})
I am trying to flag, where users denoted a category as question mark. It does work and it marks the flag for all rows with a question mark. But it also adds the the Y flag to rows which are blank in that column.
df['?_Flag'] = np.where(df['Category'].str.contains("\?"), 'Y', '')
Do I need to use match instead?
This is the dataframe I get:
ID Category ?_Flag
12374 Y
19352 Y
21014 Y
2619 ???? ????? Y
2621 ? ? Y
9566 Y
9686 ?? ? Y
61319 Y
68086 Y
69239 Y
69353 ?? ? Y
69373 Y
69491 ? ? Y
69535 ???? ????? ??? Y
69582 ? ? Y
69691 ?? ? Y
174572 A
174637 A
174646 B
175286 B
175390 C
Could it be related to the datatype?
df.info()
First_Name_E 197357 non-null object