-1

I have a dataframe with 4 columns. The 3 first colums are only useful to use as a group by for me. I want to get all the possible combinations of Event numbers for 1 Employee No/Client Number/Date. As an example, in the photo below :

https://i.stack.imgur.com/5r3vQ.png

This is the output i would want to get :

https://i.stack.imgur.com/JiroJ.png

Note that for me the order is not important, meaning that the combination 123,4567 is the same as the combination 4567,123. So if there was let's say 5 cases of 123,4567 and 8 cases of 4567,123 i would want only one line with 123,4567 and 13.

Any idea ? I'm still new to Python and kind of stuck!

Thank you very much :)

Edit :

This code seems to be working :


import pandas as pd
import time
from collections import Counter
from itertools import chain, combinations

import sys
sys.path.append('C:/Config Python')
import config
import pyodbc  
import pandas as pd
import numpy as np

pd.options.display.max_colwidth = 150
  
#Build teradata connection function  
def td_connect(usr, pwd, DRIVER = 'XXX', DBCNAME = 'YYY'):  
    try:  
        conn_td = pyodbc.connect(DRIVER=DRIVER, DBCNAME=DBCNAME, UID=usr, PWD=pwd, autocommit = True)  
        return conn_td  
    except IOError as e:  
        print('I/O error !')   
        

          
#Give the query you wish to run           
sql = """ 

The code is here

"""  
  
#Put td login information  
conn = td_connect(usr=config.username,pwd=config.password)  
  
#get data  
df = pd.read_sql(sql, conn)  

df

gp = df.groupby(['Employee no', 'Client number', 'Date'])

d = dict()
for name, group in gp:
    l = group['Event Number'].to_list()    
    try:
        d[len(l)].append(l)
    except KeyError:
        d[len(l)] = [l]
d

meets = []
for i in d.keys():
    meets.append(Counter(chain.from_iterable(combinations(line, i) for line in d[i])))
    
print(meets)

1 Answers1

0

Inspired from Concatenate strings from several rows using Pandas groupby

df['Combinations'] = df.groupby(['Employee no', 'Client number', 'Date'])['Event Number'].transform(lambda x: ",".join(x))
df['Counts'] = df.groupby(['Employee no', 'Client number', 'Date']).counts()['Event number'] 
result = df[['Employee no', 'Client number', 'Date', 'Combinations', 'Counts']].drop_duplicates()
marcant0
  • 189
  • 8
  • Thank you very much! I'm at the first line. It doesn't seem to work well with floats.. it puts a , between all the numbers – Nydia Morin-Rivest Jun 22 '20 at 14:29
  • Acknowledging that you use a ```pandas.read_csv```, add the parameters ```seperator=","```. Nevertheless, please provide the code you're using as @Alper asked earlier in comment. You can edit your previous post. – marcant0 Jun 22 '20 at 14:32