I have a huge python list, about 100 MB size with strings and integers. I have some strings as triplicates and duplicates. I have tried to remove duplicates with this code:
from collections import OrderedDict
duplicates = [.......large size list of 100 MB....]
remove = OrderedDict.fromkeys(duplicates).keys()
print remove
I have done with small size lists and it works good, with this large list, it has taken me a whole day and am not yet done. Any suggestions on how this can be done in minutes, ..fewer hrs??. I have tried CUDA installation in Ubuntu to work it out but I keep getting errors: see here