using groupby attribute in pandas

Question

I have a DataFrame with 3 columns (A,B,C) and a large number of rows. There are different types of elements in each of these columns: A1,A2... B1,B2... and C1, C2... respectively.

I want to find the number of times a particular combination (say (A1,B2,C2)) occurs in a row. Then I want to generate a (summary) list of all combinations whose frequency-counts exceed a fixed threshold. Example: combinations number of counts (A1,B2,C2) 5 (A2,B2,C2) 7 .... if the fixed value is 6.

I am new to pandas and numpy. Can this be done efficiently using pandas, and if so how?

You need to add example code and data. Also, just call your columns A,B,C and their values A1,A2,..., B1,B2,... ,C1,C2,... for clarity. — smci, Jun 06 '17 at 08:39
Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Jun 06 '17 at 08:39
Yes it can be done in pandas. When you say *"I want to find the no. of times a particular combination (a_i,b_j,c_k) occurs in a row."*, do you want your output to be a table, or what? Show us example output. Then, write a better title, because *"iterating over a dataframe using pandas"* tells us nothing. — smci, Jun 06 '17 at 08:40
@smci I want to get a list of all those combinations whose counts are greater than a fixed value. for ex: combinations No. of counts (A1,B2,C2) 5 (A2,B2,C2) 7 .... if the fixed value is 6. then i want a list of combinations for which count is greater than 6. — mihir shanvir, Jun 06 '17 at 10:34
@MihirShanvir - check my answer, if some problem let me know. — jezrael, Jun 06 '17 at 10:47
@MihirShanvir: then please edit those new requirements into your question (click the 'Edit' button above). The original statement was very unclear ("iterate over dataframe"), that's why it got downvoted and closed. — smci, Jun 07 '17 at 17:38
@MihirShanvir: useful terms to use here: *"frequency-count"* and *"generate table of frequency-counts for variable combinations"*, then *"apply a threshold min-count to that table"*. — smci, Jun 07 '17 at 17:42
@MihirShanvir so in your example where the frequency-counts are (A1,B2,C2) 5 (A2,B2,C2) 7 and your threshold is 6, should the list exclude (A1,B2,C2) or not? — smci, Jun 07 '17 at 17:44

score 1 · Accepted Answer · edited Jun 07 '17 at 17:36

1

df = pd.DataFrame({'A':['A1','A1','A2','A3'],
                   'B':[4,4,6,4],
                   'C':[7,7,9,7]})

print (df)
    A   B  C
0   4   7  C1
1   4   7  C1
2   6   9  C2
3   4   7  C3

For count of all combination use groupby + size:

s = df.groupby(["A", "B","C"]).size()
print (s)
A   B   C
A1  4   7     2
A2  6   9     1
A3  4   7     1
dtype: int64

For filtered list by value add boolean indexing:

L = s.index[s > 1].tolist()
print (L)
[('A1', 4, 7)]

edited Jun 07 '17 at 17:36

smci

32,567
20
113
146

answered Jun 06 '17 at 08:46

jezrael

822,522
95
1,334
1,252

Thanks. this is exactly what i wanted. – mihir shanvir Jun 06 '17 at 10:54
how can get the size value? i.e. how can i find the size when i know the combination say (A1,4,7). – mihir shanvir Jun 14 '17 at 04:34
Use `print (s.loc[L])` and if only one value and need scalar use `print (s.loc[L].values[0])` – jezrael Jun 14 '17 at 05:01

Pietro Tortella · Answer 2 · 2017-06-06T08:59:43.370

0

Using only pandas, one way could be to use DataFrame.groupby():

counts = dict()

for group in df.groupby(['It', 'Cc', 'Ct']):
    print(group)
    counts[group[0]] = len(group[1])

A faster way is probably to transform your dataframe into a list and use Counter from collections:

from collections import Counter

listed_df = [tuple(line) for line in list(df.values)]
counts = Counter(listed_df)

edited Jun 06 '17 at 08:59

answered Jun 06 '17 at 08:53

Pietro Tortella

1,084
1
6
13

using groupby attribute in pandas

2 Answers2