0

I have a df as follows:

A B C
a b c1
a c c2
b a c3
a b c4

Now, basing on columns A and B I would like to get an output like this:

A B C
a b [c1, c4]
a c c2
b a c3

So, if there are the same values in columns A and B of the record, I want to merge the values in column C into one structure - array would be the best. How can I achieve such a thing? Thanks in advance

Jakub Sapko
  • 304
  • 2
  • 15

1 Answers1

0
import pandas as pd

data = {'A': ['a', 'a', 'b', 'a'], 'B': ['b', 'c', 'a', 'b'], 'C': ['c1', 'c2', 'c3', 'c4']}

df = pd.DataFrame(data)

a_vals = df.A.unique()
b_vals = df.B.unique()

new_df = pd.DataFrame(columns=['A', 'B', 'C'])

for a_val in a_vals:
    for b_val in b_vals:
        df_tmp = df.loc[(df['A'] == a_val) & (df['B'] == b_val)]
        if len(df_tmp.index) > 0:
            new_c_val = '['
            for index, row in df_tmp.iterrows():
                new_c_val = new_c_val + str(row['C']) + ','
            new_c_val = new_c_val[:-1] + ']'
            new_data = {'A': a_val, 'B': b_val, 'C': new_c_val}
            new_df = new_df.append(new_data, ignore_index=True).reindex()
print(new_df)