I have a dataframe with 4 columns. The 3 first colums are only useful to use as a group by for me. I want to get all the possible combinations of Event numbers for 1 Employee No/Client Number/Date. As an example, in the photo below :
https://i.stack.imgur.com/5r3vQ.png
This is the output i would want to get :
https://i.stack.imgur.com/JiroJ.png
Note that for me the order is not important, meaning that the combination 123,4567 is the same as the combination 4567,123. So if there was let's say 5 cases of 123,4567 and 8 cases of 4567,123 i would want only one line with 123,4567 and 13.
Any idea ? I'm still new to Python and kind of stuck!
Thank you very much :)
Edit :
This code seems to be working :
import pandas as pd
import time
from collections import Counter
from itertools import chain, combinations
import sys
sys.path.append('C:/Config Python')
import config
import pyodbc
import pandas as pd
import numpy as np
pd.options.display.max_colwidth = 150
#Build teradata connection function
def td_connect(usr, pwd, DRIVER = 'XXX', DBCNAME = 'YYY'):
try:
conn_td = pyodbc.connect(DRIVER=DRIVER, DBCNAME=DBCNAME, UID=usr, PWD=pwd, autocommit = True)
return conn_td
except IOError as e:
print('I/O error !')
#Give the query you wish to run
sql = """
The code is here
"""
#Put td login information
conn = td_connect(usr=config.username,pwd=config.password)
#get data
df = pd.read_sql(sql, conn)
df
gp = df.groupby(['Employee no', 'Client number', 'Date'])
d = dict()
for name, group in gp:
l = group['Event Number'].to_list()
try:
d[len(l)].append(l)
except KeyError:
d[len(l)] = [l]
d
meets = []
for i in d.keys():
meets.append(Counter(chain.from_iterable(combinations(line, i) for line in d[i])))
print(meets)