2

My data looks like this

Name     Title     ID
ED       HHH     11111
ED       HHH     11112
ED       HHH     11113
ED       HHH     11114
AD       BBB     11119
AD       BBB     11133

Where I want it to look like this (where all the ID's are in one cell seperated by a comma for example)

Name     Title     ID
ED       HHH     11111, 11112, 11113, 11114
AD       BBB     11119, 11133

So I've tried a few different panda options, but it seems like most of them would try and put the IDs in their on columns where I do not want that.

I have also combined the Names and Title field and tried the code below but I get 'cannot label index with a null key' and I'm not sure this option would work any way.

df_long.pivot(index='Combined', values='ID')

I'm also trying it like this (below) but its taken over 6 hours (1.5 million rows) and still not complete and that's not ideal performance when there's probably a better way.

for x in df.values:
for y in df1_list:
    if x[2] == y[2]:
        if x[3] == y[3]:
            if x[4] not in y[4]:
                y[4].append(x[4])

any idea or direction for my problem to use pandas or another solution outside of pandas?

Gil5Ryan
  • 793
  • 1
  • 8
  • 17

1 Answers1

2

This shows how to use lambda to concatenate multiple cells together in pandas: Combine two columns of text in dataframe in pandas/python

It can be used in the aggfunc argument of pd.pivot_table like this:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# here i copied your data frame and assign it to the variable "df"
df = pd.read_clipboard()
df['ID'] = df['ID'].astype('str')
pd.pivot_table(df,index=['Name','Title'],values='ID',aggfunc=lambda x: ', '.join(x))
Out[15]:
Name  Title
AD    BBB                    11119, 11133
ED    HHH      11111, 11112, 11113, 11114
Name: ID, dtype: object

You may have to sort the indexes afterwards; it appears pandas picks a different way than what you showed as your desired output.

Community
  • 1
  • 1
measureallthethings
  • 1,102
  • 10
  • 26