First, we can make your original query simpler.
In pandas there is a command nunique
which will count the number of unique items in the column as follows:
df_lev["column_name"].nunique()
Or
df_lev.column_name.nunique()
Using the following dataframe (df):
df_lev = pd.DataFrame({'column_name':['String I want','String I dont want', 'String I want', 'String I want 2', 'String I dont want']})
Which:
print(df_lev)
column_name
0 String I want
1 String I dont want
2 String I want
3 String I want 2
4 String I dont want
By eye, we can see that there are 3 unique string values in column_name
. We can also use nunique
as mentioned before:
print(df_lev.column_name.nunique())
3
If there is a string you don't want to include in this count, in this case 'String I dont want'
we can create a new df which excludes this string (read this thread for more: Deleting DataFrame row in Pandas based on column value)
df_lev_new = df_lev[df_lev.column_name != 'String I dont want']
This will remove the unwanted string, leaving our new df with only 2 unique string values :
print(df_lev_new)
column_name
0 String I want
2 String I want
3 String I want 2
Which we can count using nunique
as before:
print(df_lev_new.column_name.nunique())
2
I've used python 3.x for this. If you're using python 2.x, you need to remove all print(x)
and replace with print x