Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
673
votes
12 answers

Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } ) Which when printed…
saveenr
  • 8,439
  • 3
  • 19
  • 20
552
votes
16 answers

How to group dataframe rows into list in pandas groupby

I have a pandas data frame df like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby?
Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
342
votes
4 answers

Count unique values per groups with Pandas

I need to count unique ID values in every domain. I have data: ID, domain 123, 'vk.com' 123, 'vk.com' 123, 'twitter.com' 456, 'vk.com' 456, 'facebook.com' 456, 'vk.com' 456, 'google.com' 789, 'twitter.com' 789, 'vk.com' I try df.groupby(['domain',…
Arseniy Krupenin
  • 3,800
  • 3
  • 13
  • 18
284
votes
3 answers

How to loop over grouped Pandas dataframe?

DataFrame: c_os_family_ss c_os_major_is l_customer_id_i 0 Windows 7 90418 1 Windows 7 90418 2 Windows 7 90418 Code: print df for name, group in…
Tjorriemorrie
  • 16,818
  • 20
  • 89
  • 131
280
votes
7 answers

pandas GroupBy columns with NaN (missing) values

I have a DataFrame with many missing values in columns which I wish to groupby: import pandas as pd import numpy as np df = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', np.NaN, '6']}) In [4]: df.groupby('b').groups Out[4]: {'4': [0], '6':…
Gyula Sámuel Karli
  • 3,118
  • 2
  • 15
  • 18
265
votes
5 answers

Multiple aggregations of the same column using pandas GroupBy.agg()

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times? Example dataframe: import pandas as pd import datetime as dt import numpy as…
ely
  • 74,674
  • 34
  • 147
  • 228
228
votes
8 answers

Concatenate strings from several rows using Pandas groupby

I want to merge several strings in a dataframe based on a groupedby in Pandas. This is my code so far: import pandas as pd from io import StringIO data =…
mattiasostmar
  • 2,869
  • 4
  • 17
  • 26
214
votes
6 answers

How to access pandas groupby dataframe by key

How do I access the corresponding groupby dataframe in a groupby object by the key? With the following groupby: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar'] * 3, 'B': rand.randn(6), …
beardc
  • 20,283
  • 17
  • 76
  • 94
133
votes
12 answers

Pandas: filling missing values by mean in each group

This should be straightforward, but the closest thing I've found is this post: pandas: Filling missing values within a group, and I still can't solve my problem.... Suppose I have the following dataframe df = pd.DataFrame({'value': [1, np.nan,…
BlueFeet
  • 2,407
  • 4
  • 21
  • 24
129
votes
6 answers

group by pandas dataframe and select latest in each group

How to group values of pandas dataframe and select the latest(by date) from each group? For example, given a dataframe sorted by date: id product date 0 220 6647 2014-09-01 1 220 6647 2014-09-03 2 220 6647 …
DevEx
  • 4,337
  • 13
  • 46
  • 68
126
votes
4 answers

Group dataframe and get sum AND count?

I have a dataframe that looks like this: Company Name Organisation Name Amount 10118 Vifor Pharma UK Ltd Welsh Assoc for Gastro & Endo 2700.00 10119 Vifor Pharma UK Ltd Welsh IBD Specialist Group, 169.00 10120 …
Richard
  • 62,943
  • 126
  • 334
  • 542
111
votes
3 answers

Pandas Groupby and Sum Only One Column

So I have a dataframe, df1, that looks like the following: A B C 1 foo 12 California 2 foo 22 California 3 bar 8 Rhode Island 4 bar 32 Rhode Island 5 baz 15 Ohio 6 baz 26 …
JSolomonCulp
  • 1,504
  • 4
  • 13
  • 16
103
votes
6 answers

Groupby value counts on the dataframe pandas

I have the following dataframe: df = pd.DataFrame([ (1, 1, 'term1'), (1, 2, 'term2'), (1, 1, 'term1'), (1, 1, 'term2'), (2, 2, 'term3'), (2, 3, 'term1'), (2, 2, 'term1') ], columns=['id', 'group', 'term']) I want to…
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
100
votes
5 answers

Keep other columns when doing groupby

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this: df1 = df.groupby("item", as_index=False)["diff"].min() However, if I have more than those two columns, the other…
PointXIV
  • 1,258
  • 2
  • 15
  • 23
94
votes
5 answers

Counting non zero values in each column of a DataFrame in python

I have a python-pandas-DataFrame in which first column is "user_id" and rest of the columns are tags("Tag_0" to "Tag_122"). I have the data in the following format: UserId Tag_0 Tag_1 7867688 0 5 7867688 0 3 7867688 3 0 7867688 3.5…
Harsh Singal
  • 949
  • 1
  • 6
  • 3
1
2 3
99 100