I apologize for not providing information on my attempts earlier (I wasn't trying to ask for codes for free, just got stuck and needed some guidance).
Essentially I had a txt document with 700,000 words in paragraph form and I wanted to count the words and cross-reference it to another document which was in list form. I got this far
fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read().upper()
new_fh2 = inp.replace('.','').replace(',','').replace('?','')
new_fh3 = new_fh2.replace('-','').replace('_','').replace(';','')
new_fh4 = new_fh3.replace(':','').replace('!','').replace('(','')
new_fh5 = new_fh4.replace(')','').replace('/','')
new_fh6 = new_fh5.replace('|','').replace('&','').replace('[','')
new_fh7 = new_fh6.replace(']','').replace('%','').replace('+','')
new_fh8 = new_fh7.replace('*','').replace('@','').replace('=','')
new_fh9 = new_fh8.replace('>','').replace('<','')
new_fh10 = new_fh9.replace('{','').replace('}','').replace('~','')
new_fh11 = new_fh10.replace('"','').split()
new_fh12 = sorted(set(new_fh11))
for word in new_fh12:
print new_fh11.count(word), word`
At this point I was prepared to use LibreOffice Base to do my comparison using 2 tables, but even with the count function that reduced by word count from 700k to 34k, entering in data crashed the program whenever I tried to upload. So I had to try and think of a code which would allow me to compare the two txt files in python, which handles this volume of data nicely. And I really had NO idea where to even begin, although I did know of a few merge functions I just didn't know how to define the merge. I ended up doing this instead
new_fh12 = new_fh11.split()
new_fh12.sort()
for x in sorted(new_fh12):
print x
then I took this list and put it into excel in one column, added my second list to another column, then used the countif function to count and compare the two lists.