I have about 750 files (.csv) and each line has one entry which is a UUID. My goal for this script to to count how many unique UUIDs exist across all 750 or so files. The file name structure looks like the following:
DATA-20200401-005abf4e3f864dcb83bd9030e63c6da6.csv
As you can see, it has a date and some random id. They're all in the same directory and they all have the same file extension. The format of each file is new line delimited and just has a UUID that looks like the following: b0d6e1e9-1b32-48d5-b962-671664484616
I tried merging all the files, but things got messy and this is about 15GB worth of data.
My final goal is to get an output such that it states the number of unique IDs across all the files. For example:
file1:
xxx-yyy-zzz
aaa-bbb-ccc
xxx-yyy-zzz
file2:
xxx-yyy-zzz
aaa-bbb-ccc
xxx-yyy-zzz
The final output after scanning these two files would be:
The total number of unique ids is: 2