0

I have this code which uses a list of generated dict keys, that are produced each for a csv by date name. 'frames' is reading the csv and the dict_keys keys specifies which csv file. The dictionary gets added to daily

frames['2020-01-01'] and frames['2020-01-02'] are two different csvs, and used in this code below to sum the differences in id appearance in the two csv's.

>dict_keys(['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06'])

and this code example I am able to achieve will return the sum of all id's that are new to the 2020-01-02 file and not in the 2020-01-01 file.

new_total = frames['2020-01-01'][ ~frames['2020-01-01']['id'].isin(frames['2020-01-02']['id'])].specimenid.str.startswith('aa').sum()

And the new total will bring back the total of id's between the dates of 01-02 that are new on day 02 compared to 01.

But how would i/ can you do this in pairs without having to specify the exact date that is in the dict_keys for each pair?

For example instead of this returning one number, like:

print("newly added:", new_total)
> newly added: 373

This instead:

print("on date {}:", new_total)
> on date 2020-01-01: 0
> on date 2020-01-02: 373
> on date 2020-01-03: 201
> on date 2020-01-04: 590

Would you do this against the latest added dict_keys to the one before that? how is this done if possible? Any help would be greatly appreciated

0 Answers0