How to avoid a string containing a certain word during a count through Pandas

Question

Having created a dataframe:

df_lev = df[["column_name"]]

now, I'm counting each single unique content of this column with:

df_lev["column_name"].drop_duplicates().count().sum()

Question: Since there is a string (containing a specific word to avoid) that I need to jump during this count,

Is there a way to jump it?
how can avoid that string during the counting?

Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Mar 26 '18 at 10:23

score 0 · Accepted Answer · answered Mar 26 '18 at 10:26

First, we can make your original query simpler.

In pandas there is a command nunique which will count the number of unique items in the column as follows:

df_lev["column_name"].nunique()

Or

df_lev.column_name.nunique()

Using the following dataframe (df):

df_lev = pd.DataFrame({'column_name':['String I want','String I dont want', 'String I want', 'String I want 2', 'String I dont want']})

Which:

print(df_lev)
          column_name
0       String I want
1  String I dont want
2       String I want
3     String I want 2
4  String I dont want

By eye, we can see that there are 3 unique string values in column_name. We can also use nunique as mentioned before:

print(df_lev.column_name.nunique())
3

If there is a string you don't want to include in this count, in this case 'String I dont want' we can create a new df which excludes this string (read this thread for more: Deleting DataFrame row in Pandas based on column value)

df_lev_new = df_lev[df_lev.column_name != 'String I dont want']

This will remove the unwanted string, leaving our new df with only 2 unique string values :

print(df_lev_new)
       column_name
0    String I want
2    String I want
3  String I want 2

Which we can count using nunique as before:

print(df_lev_new.column_name.nunique())
2

I've used python 3.x for this. If you're using python 2.x, you need to remove all print(x) and replace with print x

How to avoid a string containing a certain word during a count through Pandas

1 Answers1