0

Good evening! I have a dataframe with columns "dialog_id", "type" and "message". Each row on the df is basically one message description. I want to group this df, so it would show me how many messages of each type each dialog_id had sent. dialog_id is in strings, message is int(it's just a column of 1). So, what do I have now:

| dialog_id| message |type | | -------- | -------- |------ | name1 | 1 |video| | name1 | 1 |text | | name2 | 1 |audio| 2)What I want to achive(expected output). Here 1 row is a describtion of one dialog_id, not message

dialog_id video text more types total sent messages
name1 15 20 x 35+x
name2 17 3 y 20+y

Main problem is counting each type for each dialog.Groupby counts either each dialog or each type

aggregation_functions = {'message': 'sum', 'type': 'first', 'dialog_id':'first'}
df_n = df_var.groupby('dialog_id').aggregate(aggregation_functions)

but this and other variatons don't really work. Please help

  • So need first merge both DataFrames? and then aggreagate? What is expected ouput? – jezrael Nov 21 '22 at 06:40
  • @jezrael There is only one dataframe(first table), expected output is the second table(where each row is a describtion of dialog_id) – Софія Гринь Nov 21 '22 at 06:59
  • Not understand, I cannot see data from `df1` possible convert to `df2`, e.g. in df1 not exist `video=15,17` – jezrael Nov 21 '22 at 07:04
  • in another words df1 not match with df2 ouput. – jezrael Nov 21 '22 at 07:05
  • @jezrael no, video 15,17 it is count. So, for example I have 15 video messages from one person, in initial dataframe it is stored like 15 rows of "message": "1", "type":"video", I want to sum them so there is only one such row, presenting amount – Софія Гринь Nov 21 '22 at 08:07
  • Try `pd.crosstab(df_var['dialog_id'], df_var['type']).assign(**{'total sent messages': lambda x: x.sum(axis=1)})` – jezrael Nov 21 '22 at 08:46

0 Answers0