I have a large csv around 24 million rows, and I want to cut in size.
Here is a little preview of a csv:
I want to remove the rows that have the same CIK and IP, because I have a bunch of these files and they take up a lot of space, so I want to make an efficient way to remove the duplicates.
I've made to test how many duplicates of CIK are there, and for some there are more then 100k, that is why I want to cut those duplicates out.
I've tried some stuff but in most cases it failed, because of the size of the csv.