I have a text file, lets call it "input.txt." It has probably a billion different words and I'm trying to put everything together in a python script and make sure that it deleted everything that's duplicate and then create a new file lets call it "output.txt" which will show a clean version of input.txt without any duplicates.
I thought of using python because it feels so light and fast but I'm not sure if I will be able to manage so much data, about 1 billion of lines of random words.
At the same time I'm not sure if I should be using anything else or even how to start my python script. So far i'm here:
import os
with open("input.txt")
And I'm just stuck. I have never used Python for anything of that sort and I'm unsure on how to continue.
An example of what the input looks like:
red
red1
red2
red
RED@
ReD
RED@
red
ReD
and so on.. (random words but case sensitive)
and the desired output should be like:
red
red1
red2
RED@
ReD
Any help is highly appreciated.
Thanks!