If this is an operation you will perform more than once, may I suggest threading? The following is some pseudo-code.
First, split the files up into 100,000 line files in Linux:
> split -l 100000 usernames.txt usernames_
Then, spawn some threads to do this parallel-wise.
import threading
usernames_one = set()
usernames_two = set()
filereaders = []
# Define this class, which puts all the lines in the file into a set
class Filereader(threading.Thread):
def __init__(self, filename, username_set):
# while 1:
# read a line from filename, put it in username_set
...
# loop through possible usernames_ files, and spawn a thread for each:
# for.....
f = Filereader('usernames_aa', usernames_one)
filereaders.append(f)
f.start()
# do the same loop for usernames_two
# at the end, wait for all threads to complete
for f in filereaders:
f.join()
# then do simple set intersection:
common_usernames = usernames_one ^ usernames_two
# then write common set to a file:
common_file = open("common_usernames.txt",'w')
common_file.write('\n'.join(common_usernames))
You'll have to check if set addition is a thread-safe procedure. If not, you can of course create a list of sets (one for each file processed by the thread), and at the end union them all before intersecting.