0

I have written code to extract record information from a folder of old csv files into dictionary A and a folder of new csv files into dictionary B. The dictionaries each look like the following with different filenames for each record indicating where the record(or row) that it came from:

{'MC1003-1513846743.67153296': {'row': 2, 'record': ['MC1003-1', '5138467', '43.67', '15', '', '', '', '', '', '', '', '', '', '', '3296'], 'file_name': 'Timecard-MC1003-1-20220425103004.csv'}, 'MC1003-1546339635.95153296': {'row': 3, 'record': ['MC1003-1', '5463396', '35.95', '15', '', '', '', '', '', '', '', '', '', '', '3296'], 'file_name': 'Timecard-MC1003-1-20220425103004.csv'}

I am trying to compare the two dictionaries each of which contain over a thousand different records coming from old and new folder. What is the best way to find if a record in dictionary A is not present in dictionary B and do the same the other way around i.e check if a record in dictionary B is not present in dictionary A. Can someone please help me! I am struggling to find a solution? The code I have written below in theory should look at all the records in dictionary B, compare each one of them with the record in dictionary A, and output the record if it doesn't match the specific records being compared. However, I want to output a record from dictionary A only if it doesn't match any of the records in dictionary B. Right now, it outputs all the records for some reason. Please let me know what I am doing wrong? Here dir_A_dict and dir_B_dict are the dictionaries that I have read in.

for a in dir_A_dict.keys():
    row_a = dir_A_dict[a].get('row')
    result_a = dir_A_dict[a].get('record')
    name_a = dir_A_dict[a].get('file_name')
    for b in dir_B_dict.keys():
        row_b = dir_B_dict[b].get('row')
        result_b = dir_B_dict[b].get('record')
        name_b = dir_B_dict[b].get('file_name')
        
        if result_a != result_b:
            print("Record", result_a,"in file",name_a, "is different from", result_b,"in file", name_b)

The output for this code comes out to be like this. In this case since the record in dictionary A is clearly present in dictionary B, the code should go to the next record in dictionary A and find if that record is present in dictionary B as well or not:

Record ['MC1003-1', '5138467', '43.67', '15', '', '', '', '', '', '', '', '3296'] in file Timecard-MC1003-1-20220425100254-Reported.csv is different from ['MC1003-1', '5138467', '43.67', '15', '', '', '', '', '', '', '', '', '', '', '3296'] in file Timecard-MC1003-1-20220425103004.csv
Record ['MC1003-1', '5138467', '43.67', '15', '', '', '', '', '', '', '', '3296'] in file Timecard-MC1003-1-20220425100254-Reported.csv is different from ['MC1003-1', '5463396', '35.95', '15', '', '', '', '', '', '', '', '', '', '', '3296'] in file Timecard-MC1003-1-20220425103004.csv
Bilal Hussain
  • 85
  • 2
  • 9
  • Does this answer your question? [Recursive diff of two dictionaries (keys and values)?](https://stackoverflow.com/questions/5903720/recursive-diff-of-two-dictionaries-keys-and-values) – sytech May 15 '22 at 00:55
  • No it doesn;t answer my question – Bilal Hussain May 15 '22 at 02:52

1 Answers1

0

Just pile all of the records from a dict into a set:

aset = set()
for v in adict.values():
    [aset.add(record) for record in v['record']]

# ... build other set

Then you have a simple way to query what records belong to dictionary A that are not in dictionary B:

# aset is generated from dictionary A
# bset is generated from dictionary B

in_a_not_b = aset - bset
alex
  • 76
  • 5
  • but how would I know which records were different. I need to be able to say the following for the output: Dictionary A had a record in file "FolderA_Timecard-MC1010-19-20220507140122-Reported" with row number 2 that was not present in Dictionary B. Record is: MC1010-19 21044174 58.55 12341 – Bilal Hussain May 15 '22 at 01:22
  • Also the code you gave gives the following error with my dictionary pointing to the last line: tuple indices must be integers or slices, not str – Bilal Hussain May 15 '22 at 01:24
  • I parsed the dictionary as Dictionary[string, Dictionary[...]]; hence me iterating through the values of the dictionary (which would itself be a dictionary), then grabbing the value corresponding to the `records` key for that dictionary. EDIT: I had written `items` and not `values`, sorry. – alex May 15 '22 at 01:29
  • I still don't understand what makes each record unique from one another; if you're using the entire `(key, value)` pair for uniqueness, just pile the entire `(key, value)` pair into sets and the solution still holds. – alex May 15 '22 at 01:31
  • I wrote `items`, when I meant to write `values`. See the above edit. – alex May 15 '22 at 01:34
  • Thanks for the edit. I ran it and it gives me a bunch of numbers that don't make any sense as follows: {'19863005', '87.08', '21473743', '19998100', '1298927', '21733624', '19608971', '2395', '21899888', '5676488', '3784324', '21297281', '21081273', '21377816', '21243004', '21689483', '5525422', '21621085', '21718567', '21426239', '21494788', '21704734', '20856012', '60.27', '19093897', '21907214', '8538917', '21244875', '21846514', '21379912', – Bilal Hussain May 15 '22 at 01:38
  • How do I pile the entire (key, value) pair into sets? – Bilal Hussain May 15 '22 at 01:38
  • Thanks for the help so far. Currently, I am just getting a bunch of numbers in in_a_not_b. I would like to know which record is present in which dictionary that is not present in the otehr dictionary and have a way to associate the record that's not in a dictionary with a filename – Bilal Hussain May 15 '22 at 01:44
  • The list itself is a record, then? In that case it should just be `aset.add(tuple(v['record']))` – alex May 15 '22 at 01:46
  • That gives me the records thanks but then again, how do I know which filename and row these records belong to? {('MC1066-1', '21540458', '6.70', '16.00', '', '', '', '', '', '', '', '466'), ('MC1012-S5', '21234998', '70.20', '15.00', '', '', '', '', '', '', '', '11217'), ('MC1009-29', '21128486', '32.08', '8.35', '', '', '', '', '', '', '', '7484'), – Bilal Hussain May 15 '22 at 01:55
  • In my dictionary, alongside the record field there's also the row number indicating which row the record belongs to on a csv file that has been converted into a dictionary and the filename of the file that was converted to a dictionary – Bilal Hussain May 15 '22 at 01:56
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/244747/discussion-between-alex-and-bilal-hussain). – alex May 15 '22 at 01:56
  • I see there is a lot of discussion here already, so perhaps OP and @alex are finding a solution. But, as an alternative approach, consider reading into dict into a dataframe, concat the 2 dataframes together (maybe some thinking about best way to do that, if there are key differences between the 2 dicts) and then a drop_duplicates command on the combined dataframe will leave only the unique records. – bici.sancta May 15 '22 at 15:52