Str.contains in python pandas also flags blank

Question

For the following df (please note that the df I am working with is read in raw data imported from a txt file and not the below df created in python for this example)

import pandas as pd
df = pd.DataFrame({'ID': ['12374' ,'19352','21014','2619','2621','9566','9686','61319','68086','69239','69353', '69373','69491','69535','69582','69691','174572','174637','174646','175286','175390'], 
                   'Category': [' ', ' ', ' ', '???? ?????','? ?',' ','?? ?',' ',' ',' ','?? ?',' ','? ?','???? ????? ??? ','? ?','?? ?','A','A','B','B','C']})

I am trying to flag, where users denoted a category as question mark. It does work and it marks the flag for all rows with a question mark. But it also adds the the Y flag to rows which are blank in that column.

df['?_Flag'] = np.where(df['Category'].str.contains("\?"), 'Y', '')

Do I need to use match instead?

This is the dataframe I get:

ID      Category    ?_Flag
12374                  Y
19352                  Y
21014                  Y
2619    ???? ?????     Y
2621    ? ?            Y
9566                   Y
9686    ?? ?           Y
61319                  Y
68086                  Y
69239                  Y
69353   ?? ?           Y
69373                  Y
69491   ? ?            Y
69535   ???? ????? ??? Y
69582   ? ?            Y
69691   ?? ?           Y
174572   A
174637   A
174646   B
175286   B
175390   C

Could it be related to the datatype?

df.info()

First_Name_E  197357 non-null object

Please read [this](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) to provide a reproducible example. — juanpa.arrivillaga, Mar 27 '17 at 21:08
Second @juanpa.arrivillaga's comment. I don't see why your answer won't work as is. We need to see sample data where you can reproduce the problem. — piRSquared, Mar 27 '17 at 21:25
also to add, this person seems to not be putting much effort in to their own project seeing as they have asked 5 questions regarding this df in the last ~2 hours. we are happy to help, but not do your whole task for you, how do you plan to learn? — gold_cy, Mar 27 '17 at 21:32
@DmitryPolonskiy; this is not a productive comment; I admit that I am very new to strings and regex and I am sorry if I have many basic questions; but I am working very hard on learning it as fast as possible and have been working a lot on my project and the reason I asked so many questions in the last 2 hours is because I summarized all the roadblocks I head, where I couldn't find a workaround or explanation in the python documentation or previous stackoverflow answers — jeangelj, Mar 27 '17 at 21:47
@jeangelj, i [can't reproduce it](http://stackoverflow.com/a/43056545/5741205)... — MaxU - stand with Ukraine, Mar 27 '17 at 21:52
@MaxU thank you very much for trying, it must be related to the raw data then; I'm trying workaround to add a 0 where there is no value, maybe this will work — jeangelj, Mar 27 '17 at 22:02

MaxU - stand with Ukraine · Answer 1 · 2017-03-27T21:51:02.810

0

I can't reproduce your issue using Pandas 0.19.2:

In [16]: df['?_Flag'] = np.where(df['Category'].str.contains("\?"), 'Y', '')

In [17]: df
Out[17]:
        ID        Category ?_Flag
0
1    19352
2    21014
3     2619      ???? ?????      Y
4     2621             ? ?      Y
5     9566
6     9686            ?? ?      Y
7    61319
8    68086
9    69239
10   69353            ?? ?      Y
11   69373
12   69491             ? ?      Y
13   69535  ???? ????? ???      Y
14   69582             ? ?      Y
15   69691            ?? ?      Y
16  174572               A
17  174637               A
18  174646               B
19  175286               B
20  175390               C

edited Mar 27 '17 at 21:51

answered Mar 27 '17 at 21:09

MaxU - stand with Ukraine

205,989
36
386
419

Curios: why would using regex flag blank columns? – juanpa.arrivillaga Mar 27 '17 at 21:11
@juanpa.arrivillaga, i've added an explanation into my answer - please check – MaxU - stand with Ukraine Mar 27 '17 at 21:13
Right, but they've used an escape sequence, so it shouldn't matter. – juanpa.arrivillaga Mar 27 '17 at 21:14
Thank you both, it must be somehow related to the raw data. I checked the fields both within the python dataframe and the raw data and they are blank - I am working on a workaround to add a 0 for blanks, so maybe this way the flag won't think it's a question mark – jeangelj Mar 27 '17 at 22:05

score 0 · Answer 2 · edited May 13 '19 at 15:42

0

df['?_Flag'] = np.where(df['Category'].str.contains("\?", na=False), 'Y', '')

"na=False" will give the correct result

edited May 13 '19 at 15:42

double-beep

5,031
17
33
41

answered May 13 '19 at 15:39

user11493744

1

Str.contains in python pandas also flags blank

2 Answers2