0

I'm first going to show with an example what I mean:

Let's suppose that I have this dataframe:

enter image description here

If I group the dataframe without the penultimate column (column_2), I want to end up with this:

enter image description here

And, if I have a dataframe where column_1 and last_column have the same values. I don't need to "join" or "append" the values in the "column_2", I just want an empty dataframe.

Does that make sense what I mean? What I've just have is this:

import pandas as pd

data = {'column_1': ['no', 'no', 'no', 'no'], 'column_2': ['spain', 'france', 'italy', 'germany'], "last_column": ['A', 'A', 'A', 'B']}

df = pd.DataFrame.from_dict(data)

aux = df.drop(columns = ['column_2'])
indices_to_keep = aux.groupby(aux.columns.to_list()).filter(lambda x : len(x)<2).index
df_to_keep = df.filter(items = indices_to_keep.to_list(), axis = 0)

My problem with this code, is that I don't know how to join the values on a single row when the df is being grouped.

Tonino Fernandez
  • 441
  • 4
  • 12

1 Answers1

1

I think you can just aggregate into a list:

df.groupby(
    ['column_1', 'last_column']
)['column_2'].agg(list).reset_index()[
    ['column_1', 'column_2', 'last_column']
]
Matt
  • 1,196
  • 1
  • 9
  • 22
  • oh didn't know it was possible! awesome! any idea of how to preserve the order of the columns? @Matt I'm trying by adding "sort = False" but is still showing the same – Tonino Fernandez Dec 14 '22 at 14:58
  • Thanks @Matt. Is there any way to filter the rows when the length of the result of the aggregation is 1? I mean, for this case, suppose "last_column" = A. Then I don't want aggregation since everything is the same. – Tonino Fernandez Dec 14 '22 at 16:20
  • if you let this answer be ‘new_df’ then you can just filter down to those rows where the list column has length greater than 1: new_df[new_df[‘column_2’].apply(len) > 1] ([see here](https://stackoverflow.com/questions/41340341/how-to-determine-the-length-of-lists-in-a-pandas-dataframe-column)) – Matt Dec 14 '22 at 19:48