Pandas Group by 2 columns and count instances of T and F to create 2 new columns

Question

I have a pandas dataframe with 3 columns: source_name, dest_address, and fall_between. I would like to group by the first 2 columns and create 2 new columns based off of the fall_between column. This is what the df looks like:

df           
   source_name  dest_address   fall_between
0  source_1     72.21.215.90   False
1  source_1     72.21.215.90   False
2  source_1     72.21.215.90   False
3  source_1     72.21.215.90   False
4  source_1     131.107.0.89   False
5  source_1     131.107.0.89   False
6  source_2     69.63.191.1    False
7  source_2     69.63.191.1    True
8  source_2     69.63.191.1    True
9  source_2     69.63.191.1    True
10 source_2     69.63.191.1    True

Desired output:

df
   source_name  dest_address   true_count  false_count
0  source_1     72.21.215.90   0           4
1  source_1     131.107.0.89   0           2  
2  source_2     69.63.191.1    4           1

I was using the following but I am not getting a count if it is 0. What is a better way to do this?

df[df['fall_between'] == True].groupby(['source_name','dest_address']).size().reset_index(name='true_count')
df[df['fall_between'] == False].groupby(['source_name','dest_address']).size().reset_index(name='false_count')

What will became if you substitute `size()` with `count()`? – Alex Yu Feb 14 '19 at 18:50 — Alex Yu, Feb 14 '19 at 18:50
I get the same result @AlexYu – sectechguy Feb 14 '19 at 19:09 — sectechguy, Feb 14 '19 at 19:09

score 4 · Accepted Answer · answered Feb 14 '19 at 19:12

4

You can use pd.crosstab:

pd.crosstab([df.source_name, df.dest_address], df.fall_between).reset_index()

fall_between  source_name  dest_address  False  True
0               source_1  131.107.0.89      2     0
1               source_1  72.21.215.90      4     0
2               source_2   69.63.191.1      1     4

answered Feb 14 '19 at 19:12

yatu

86,083
12
84
139

1

This is exactly what I am looking for. I will be sure to read the doc as I have never used crosstab before. Thank you @yatu – sectechguy Feb 14 '19 at 19:16

score 2 · Answer 2 · answered Feb 14 '19 at 19:13

2

Since you want to include the 0s, I'm wondering if it makes more sense to use .value_counts().

Take a look: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html

answered Feb 14 '19 at 19:13

clarktwain

210
1
3
13

Pandas Group by 2 columns and count instances of T and F to create 2 new columns

2 Answers2