2

In sql, select a.*,count(a.id) as N from table a group by a.name would give me a new column 'N'containing the count as per my group by specification.

However in pandas, if I try df['name'].value_counts(), I get the count but not as a column in the original dataframe.

Is there a way to get the count as a column in the original dataframe in a single step/statement?

babsdoc
  • 693
  • 2
  • 11
  • 24
  • Not many of us know SQL enough to figure out what that query does so it would be helpful if you include a sample dataset with the desired output. From your description, `df.groupby('name').transform('count')` might work. – ayhan Jul 30 '17 at 13:49

1 Answers1

4

It seems you need groupby + transform function size:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'name':list('aaabcc')})

print (df)
   A  B  C  D  E name
0  a  4  7  1  5    a
1  b  5  8  3  3    a
2  c  4  9  5  6    a
3  d  5  4  7  9    b
4  e  5  2  1  2    c
5  f  4  3  0  4    c

df['new'] = df.groupby('name')['name'].transform('size')
print (df)
   A  B  C  D  E name  new
0  a  4  7  1  5    a    3
1  b  5  8  3  3    a    3
2  c  4  9  5  6    a    3
3  d  5  4  7  9    b    1
4  e  5  2  1  2    c    2
5  f  4  3  0  4    c    2

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi Jezrael, I need the output as you have shown in your example. However when I try to do the same I get the following warning/error: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – babsdoc Jul 30 '17 at 13:58
  • I think there is problem in one row above and need `copy()` - check [this](https://stackoverflow.com/a/45170475/2901002) – jezrael Jul 30 '17 at 14:00
  • Here is what I am doing dat1=pd.read_csv("C:\\Test.csv") dat2=dat1[['Stock', 'Date', 'AdjClose', 'Volume']] dat2['count'] = dat2.groupby('Stock')['Stock'].transform('count') – babsdoc Jul 30 '17 at 14:04
  • 1
    So need `dat2=dat1[['Stock', 'Date', 'AdjClose', 'Volume']].copy()` – jezrael Jul 30 '17 at 14:05
  • If you modify values in `dat2` after you will find that the modifications do not propagate back to the original data (`dat1`), and that Pandas does warning. – jezrael Jul 30 '17 at 14:07
  • It create `deep` copy of object for new df, because in pandas indexing a DataFrame returns a reference to the initial DataFrame. – jezrael Jul 30 '17 at 14:20