count frequency of items in one column in relation to criteria in another column in python

Question

I have a data frame that looks something like this:

Category	Topic
Category1	Topic1
Category2	Topic2
Category1	Topic2
Category3	Topic3
Category2	Topic3
Category3	Topic3

And I want an output like this:

Category	Topic	Frequency
Category1	Topic1
	Topic2
	Topic3
Catgeory2	Topic1
	Topic2
	Topic3
Category3	Topic1
	Topic2
	Topic3

I am just starting out with python and I'd really appreciate it if someone could help me out with this.

you can check out `groupby` but I'm guessing you may actually be looking for a pivot https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe — Chris, Apr 06 '22 at 12:07
What should go in the `frequency` column? The relative frequency of `topic1`, `topic2` etc. within a `category`? (e.g. the sum for the first three rows of your output example would be 1? — Pierre D, Apr 06 '22 at 12:07
Welcome to Stack Overflow. Take a look at the [guide](https://stackoverflow.com/help/how-to-ask) on how to ask a quality question. In particular, it's good to give a sense of what you already tried including things that you searched for on SO. — PeterK, Apr 06 '22 at 13:05

score 1 · Answer 1 · answered Apr 06 '22 at 12:15

If the frequency is meant to capture the frequency of topic within each category, then, a basic approch involves:

df.groupby('Category')['Topic'].value_counts(normalize=True)

Which is a Series. For example, on your input data, we get:

Category   Topic 
Category1  Topic1    0.5
           Topic2    0.5
Category2  Topic2    0.5
           Topic3    0.5
Category3  Topic3    1.0
Name: Topic, dtype: float64

For an output organized as per your example, that appears to be a DataFrame with three columns:

out = (
    df
    .groupby('Category')['Topic']
    .value_counts(normalize=True)
    .to_frame('frequency')
    .reset_index()
)

Again, on your input sample:

>>> out
    Category   Topic  frequency
0  Category1  Topic1        0.5
1  Category1  Topic2        0.5
2  Category2  Topic2        0.5
3  Category2  Topic3        0.5
4  Category3  Topic3        1.0

count frequency of items in one column in relation to criteria in another column in python

1 Answers1