In the current project I am working on I need to check a df_totals
dataframe against a df_fact
and give an output to the user as the df_output
in the image below of which lines should be deleted so that df_fact
would reconcile with df_totals
.
PROBLEM: Basically I want to try to find combinations of values in the df_fact['VAL']
by a company that needs to be deleted so that the SUM of VAL by COMPANY in df_fact should be equal to df_totals['TOTALS']
.
I believe it is a complex solution because such an algorithm to solve it will possibly do so through the use of combinations, due to the fact that I need to delete one or more lines by company names.
What I'm trying to do
Here you can see the code I have written so far to try to solve it. I have created a merged dataframe to calculate the differences and iterate over them.
import pandas as pd
# group fact by company
df_fact_agg = df_fact.groupby('COMPANY', as_index=False).sum()
df_fact_agg.head()
# merge the two dataframes
df_compare = pd.merge(
df_totals,
df_fact_agg,
left_index=True,
right_index=True
)
# clean the dataframe
df_compare.drop(['COMPANY_y'], axis=1, inplace=True)
df_compare.rename(columns={"COMPANY_x": "COMPANY", "TOTALS": "LEFT_TOTALS", "VAL": "RIGHT_VAL"}, inplace=True)
# create a new column to calculate the difference between values
df_compare['Diff'] = df_compare['RIGHT_VAL']-df_compare['LEFT_TOTALS']
# try to find find a way to solve it
for val in df_compare.COMPANY.unique().tolist():
# for each unique name of company get the current difference
difference_value = df_compare[df_compare['COMPANY']==val].Diff.values[0]
# get all values from series
series_differences = df_fact[df_fact['COMPANY']==val]['VAL']
print('The TOTAL difference calculated is: ', difference_value)
print('The serie that I need to work to check whether value(s) need to be deleted is: ', series_differences.values)
#####################################################################
########## from now on I do not know what to do #####################
################## please help me =) ################################
#####################################################################
The output
The TOTAL difference calculated is: 400.0
The serie that I need to work to check whether value(s) is (are) missing or remaining is: [5000. 3000. 400.]
The TOTAL difference calculated is: 724.0
The serie that I need to work to check whether value(s) is (are) missing or remaining is: [ 50. 5432. 7894. 674.]
The TOTAL difference calculated is: 895.0 The serie that I need to work to check whether value(s) is (are) missing or remaining is: [ 896. 4578. 20. 547. 6348. 875.]
The TOTAL difference calculated is: 14451.0
The serie that I need to work to check whether value(s) is (are) missing or remaining is: [ 967. 678. 645. 87. 9564. 3564.]
The data I am using
df_totals
{'COMPANY': {0: 'COMPANY 1', 1: 'COMPANY 2', 2: 'COMPANY 3', 3: 'COMPANY 4'},
'TOTALS': {0: 8000.0, 1: 13326.0, 2: 12369.0, 3: 1054.0}}
df_fact.columns
['_KEY', 'COMPANY', 'DOC NUM', 'VAL']
df_fact.values
[['COMPANY 1DOC 123', 'COMPANY 1', 'DOC 123', 5000.0],
['COMPANY 1DOC 124', 'COMPANY 1', 'DOC 124', 3000.0],
['COMPANY 1DOC 125', 'COMPANY 1', 'DOC 125', 400.0],
['COMPANY 2DOC 126', 'COMPANY 2', 'DOC 126', 50.0],
['COMPANY 2DOC 127', 'COMPANY 2', 'DOC 127', 5432.0],
['COMPANY 2DOC 128', 'COMPANY 2', 'DOC 128', 7894.0],
['COMPANY 2DOC 129', 'COMPANY 2', 'DOC 129', 674.0],
['COMPANY 3DOC 130', 'COMPANY 3', 'DOC 130', 896.0],
['COMPANY 3DOC 131', 'COMPANY 3', 'DOC 131', 4578.0],
['COMPANY 3DOC 132', 'COMPANY 3', 'DOC 132', 20.0],
['COMPANY 3DOC 133', 'COMPANY 3', 'DOC 133', 547.0],
['COMPANY 3DOC 134', 'COMPANY 3', 'DOC 134', 6348.0],
['COMPANY 3DOC 135', 'COMPANY 3', 'DOC 135', 875.0],
['COMPANY 4DOC 136', 'COMPANY 4', 'DOC 136', 967.0],
['COMPANY 4DOC 137', 'COMPANY 4', 'DOC 137', 678.0],
['COMPANY 4DOC 138', 'COMPANY 4', 'DOC 138', 645.0],
['COMPANY 4DOC 139', 'COMPANY 4', 'DOC 139', 87.0],
['COMPANY 4DOC 140', 'COMPANY 4', 'DOC 140', 9564.0],
['COMPANY 4DOC 141', 'COMPANY 4', 'DOC 141', 3564.0]]
df_output
{'CHECK COLUMNS': {0: nan,
1: nan,
2: 'DELETE',
3: 'DELETE',
4: nan,
5: nan,
6: 'DELETE',
7: nan,
8: nan,
9: 'DELETE',
10: nan,
11: nan,
12: 'DELETE',
13: nan,
14: 'DELETE',
15: 'DELETE',
16: nan,
17: 'DELETE',
18: 'DELETE'}}