Counting Unique Values in a Column

Question

I have a df that has one column with multiple comma-separated values in each row. I want to count how many times a unique value occurs in that column.

The df looks like this:

                             category  country
0  widget1, widget2, widget3, widget4      USA
1                    widget1, widget3      USA
2                   widget1, widget2     China
3                             widget2   Canada
4           widget1, widget2, widget3    China
5                             widget2  Vietnam
6                             widget3   Canada
7                    widget1, widget3      USA
8                    widget1, widget3    Japan
9                             widget2  Germany

I want know how many times each widget appears in the column "category". The results in this example would be:

widget1 = 6, widget2 = 6, widget3 = 6, widget4 = 1

I can use .value_counts

df["category"].value_counts()

but that's only going to return rows that are exactly the same.

I could use value_counts and enter each value for it to count, but in the actual DataFrame there are too many rows and unique values in that column to make it practical.

Also, is there a way to not double count if a single row contains two values that are the same? For example is there was a "widget1, black widget1, yellow widget1" in the same row, I'd just want to count that as one widget1.

Dont post pictures in your question since people cant copy that. Your data is already in jupyter notebook, you can simply do `print(df)` and copy and paste that output in your question. — Erfan, May 22 '19 at 15:56
Again: 1. `print(df)` 2. Select the output 3. press `ctrl + c` 4. Edit your question and press `ctrl + v` — Erfan, May 22 '19 at 16:07
That's what I did, but the formatting is a mess when I paste it in here. Ugh. What am I missing? — PythonFisher, May 22 '19 at 16:12
Fixed it here. If you want to insert code, put it between three of these (`) and close it again with three of those characters. — Erfan, May 22 '19 at 16:13
I get it. Thanks for taking the time. I wasn't thinking of the output as code. My bad. — PythonFisher, May 22 '19 at 16:15

score 4 · Accepted Answer · answered May 22 '19 at 15:56

4

Do with get_dummies

df.category.str.get_dummies(',').replace(0,np.nan).stack().sum(level=1)

answered May 22 '19 at 15:56

BENY

317,841
20
164
234

That seems to work great! And that will insure that it doesn't double count if a value is repeated in the same row? – PythonFisher May 22 '19 at 16:01
1

@PythonFisher yes , dummies , will show exit or not , if two same still show as 1 – BENY May 22 '19 at 16:34

score 1 · Answer 2 · answered May 22 '19 at 16:18

Another solution would be to unnest your string to rows, then use value_counts:

explode_str(df, 'category', ',').value_counts()

widget2    6
widget1    6
widget3    6
widget4    1
Name: category, dtype: int64

Function used from linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})

score 0 · Answer 3 · edited Aug 28 '21 at 04:24

0

This might not be the most elegant solution but I think it should work. Basically we need to separate each word in the Category column and then count the words.

from itertools import chain
words=[i.split(',') for i in df['Category'].tolist()]
words=[i.strip() for i in chain.from_iterable(words)]
pd.Series(words).value_counts()

edited Aug 28 '21 at 04:24

PetKie

88
7

answered May 22 '19 at 15:59

iamchoosinganame

1,090
6
15

That didn't work for me. I'm trying to figure out why. – PythonFisher May 22 '19 at 16:07

Counting Unique Values in a Column

3 Answers3

Linked