Keeping all columns in pandas groupby

Question

I am trying to run a groupby function in pandas. I appreciate that there are already a lot of posts answering this but I can't get any of the answers to work. I have data as follows:

This is the input

index	created_at	full_text
0	date time object	lots of text in here
1	date time object	lots of text in here

I need to group the data by the created_at column so that text posted on the same date are grouped together and get the size of the group. I need to keep all the other columns.

I have tried various implementations as variations on the code below. I have also tried converting the datetime object to be just the date but that doesn't group the column properly. just simply trying:

df_mh.groupby(df_mh['created_at', 'full_text]).size()

Doesn't group the date properly.

This is my code at the moment:

df_mh = df_final.loc[df_final['full_text'].str.contains('mental health')]

mh_grouped = df_mh.groupby(df_mh.created_at.dt.date, as_index = False)['full_text'].size()

This is the output with the current code. So everything works except the full_text column is missing

index	created_at	size
0	2020-07-12	2
1	2020-10-12	1

So this works but misses off the full_text column.

How can I write it so that all the columns are preserved?

Edit sample data:

{'created_at': Sun Mar 22 04:11:34 +0000 2020, 'full_text': this is the full text of the first post', 'created_at': Sun Mar 22 04:11:34 +0000 2020, 'full_text': this is the full text of the second post', 'created_at': Sun Mar 24 04:11:34 +0000 2020, 'full_text': this is the full text of the third post'}

So my required process is: convert time stamp to just the day, month and year and aggregate posts so those written on the same day will be grouped and their number counted. I hope that makes sense

Can you make a meaningful example and make sure that the expected output matches it ? — Timeless, Jun 19 '23 at 19:47

score 0 · Answer 1 · answered Jun 19 '23 at 19:23

0

Using transform on a groupby object will return a table of the same size which you can then set as your new column.

df["size"] = df.groupby("created_at").transform("count")

answered Jun 19 '23 at 19:23

Michael Cao

2,278
1
1
13

Thank you for your response, I have tried that and it still misses off the full_text column and gives a bunch of NaN in the size column. What am I doing wrong? – frogger Jun 19 '23 at 19:29

score 0 · Answer 2 · answered Jun 19 '23 at 19:31

0

You can join the original dataframe to grouped dataframe by created_at field. Something like that:

df \
    .groupby("created_at", as_index=False) \
    ["full_text"] \
    .size() \
    .merge(df, on="created_at")

answered Jun 19 '23 at 19:31

Maria K

1,491
1
3
14

Thank you for your response but that doesn't work either – frogger Jun 19 '23 at 19:35

score 0 · Answer 3 · answered Jun 19 '23 at 19:41

0

try this:

df=df.groupby("created_at",as_index=False).size().assign(full_text=df["full_text"])

answered Jun 19 '23 at 19:41

Himanshu Panwar

216
2
7

Thanks but that returns one row of full text and the rest NaN – frogger Jun 19 '23 at 19:47

Keeping all columns in pandas groupby

3 Answers3