-1

For the following df (please note that the df I am working with is read in raw data imported from a txt file and not the below df created in python for this example)

import pandas as pd
df = pd.DataFrame({'ID': ['12374' ,'19352','21014','2619','2621','9566','9686','61319','68086','69239','69353', '69373','69491','69535','69582','69691','174572','174637','174646','175286','175390'], 
                   'Category': [' ', ' ', ' ', '???? ?????','? ?',' ','?? ?',' ',' ',' ','?? ?',' ','? ?','???? ????? ??? ','? ?','?? ?','A','A','B','B','C']}) 

I am trying to flag, where users denoted a category as question mark. It does work and it marks the flag for all rows with a question mark. But it also adds the the Y flag to rows which are blank in that column.

df['?_Flag'] = np.where(df['Category'].str.contains("\?"), 'Y', '')

Do I need to use match instead?

This is the dataframe I get:

ID      Category    ?_Flag
12374                  Y
19352                  Y
21014                  Y
2619    ???? ?????     Y
2621    ? ?            Y
9566                   Y
9686    ?? ?           Y
61319                  Y
68086                  Y
69239                  Y
69353   ?? ?           Y
69373                  Y
69491   ? ?            Y
69535   ???? ????? ??? Y
69582   ? ?            Y
69691   ?? ?           Y
174572   A
174637   A
174646   B
175286   B
175390   C

Could it be related to the datatype?

df.info()

First_Name_E  197357 non-null object
jeangelj
  • 4,338
  • 16
  • 54
  • 98
  • 3
    Please read [this](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) to provide a reproducible example. – juanpa.arrivillaga Mar 27 '17 at 21:08
  • 1
    Second @juanpa.arrivillaga's comment. I don't see why your answer won't work as is. We need to see sample data where you can reproduce the problem. – piRSquared Mar 27 '17 at 21:25
  • also to add, this person seems to not be putting much effort in to their own project seeing as they have asked 5 questions regarding this df in the last ~2 hours. we are happy to help, but not do your whole task for you, how do you plan to learn? – gold_cy Mar 27 '17 at 21:32
  • @DmitryPolonskiy; this is not a productive comment; I admit that I am very new to strings and regex and I am sorry if I have many basic questions; but I am working very hard on learning it as fast as possible and have been working a lot on my project and the reason I asked so many questions in the last 2 hours is because I summarized all the roadblocks I head, where I couldn't find a workaround or explanation in the python documentation or previous stackoverflow answers – jeangelj Mar 27 '17 at 21:47
  • @jeangelj, i [can't reproduce it](http://stackoverflow.com/a/43056545/5741205)... – MaxU - stand with Ukraine Mar 27 '17 at 21:52
  • @MaxU thank you very much for trying, it must be related to the raw data then; I'm trying workaround to add a 0 where there is no value, maybe this will work – jeangelj Mar 27 '17 at 22:02

2 Answers2

0

I can't reproduce your issue using Pandas 0.19.2:

In [16]: df['?_Flag'] = np.where(df['Category'].str.contains("\?"), 'Y', '')

In [17]: df
Out[17]:
        ID        Category ?_Flag
0
1    19352
2    21014
3     2619      ???? ?????      Y
4     2621             ? ?      Y
5     9566
6     9686            ?? ?      Y
7    61319
8    68086
9    69239
10   69353            ?? ?      Y
11   69373
12   69491             ? ?      Y
13   69535  ???? ????? ???      Y
14   69582             ? ?      Y
15   69691            ?? ?      Y
16  174572               A
17  174637               A
18  174646               B
19  175286               B
20  175390               C
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0
df['?_Flag'] = np.where(df['Category'].str.contains("\?", na=False), 'Y', '')

"na=False" will give the correct result

double-beep
  • 5,031
  • 17
  • 33
  • 41