I try to groupby
and agg
but I receive an empty dataframe and no error.
When I do this:
df_temp = df.groupby('Col1')['InfoType', 'InfoLabel1', 'InfoLabel2'].agg(lambda x: ', '.join(x))
then I receive the dataframe aggregated as expected.
When I do this:
df_temp = df.groupby('Col1', 'Col2')['InfoType', 'InfoLabel1', 'InfoLabel2'].agg(lambda x: ', '.join(x))
then I receive the dataframe aggregated as expected.
When I do this:
df_temp = df.groupby('Col1', 'Col2', 'Col3')['InfoType', 'InfoLabel1', 'InfoLabel2'].agg(lambda x: ', '.join(x))
then I receive the dataframe aggregated as expected.
But when I do this:
df_temp = df.groupby('Col1', 'Col2', 'Col3', 'Col4')['InfoType', 'InfoLabel1', 'InfoLabel2'].agg(lambda x: ', '.join(x))
then I receive an empty dataframe and no error.
However, I do not think that the problem is Col4
because when I remove Col2
and I still keep Col4
then I receive the dataframe aggregated as expected.
Why this is happening?
'Col1', 'Col2', 'Col3', 'Col4' are of different types but I do not think that this is the problem because for example also Col1', 'Col2', 'Col3' are of different types but the aggregation works when I group by only on these.
Can it be related to NAs in these columns?
P.S.
I know that it would better to have specific examples of my data but it would be too time-consuming to post them here and also I do not want to expose my data at all.
P.S.2
I did the following. Before the groupby
, I filled in the np.nan
with values (eg -1 for floats and 'NA' for objects) and the code worked so I was probably right at my initial hypothesis about the NAs. Feel free to share ideas why this is happening.