0

Can anyone please direct me on how to perform the following merge in Python...

Text File #1:

5     apple
1     banana
3     pear
4     kiwi

(plus a few thousand more entries)

My text file #2 looks like

apple
orange    
strawberry
banana

I want to combine the two text files such that I only add those which are in both but also keep the original numbers from the text file #1 link to the corresponding identifier. So in this example, my merge would look like this:

5     apple
1     banana
martineau
  • 119,623
  • 25
  • 170
  • 301
Kay
  • 11
  • 2
    Welcome to SO! You are encouraged to take the [SO tour](http://stackoverflow.com/tour) where you can learn what this site is for. What it is not for is asking someone to write code for free. Have you attempted anything in Python to try to read in your files and represent them as lists that can have list operations conducted on them? If so, you will get more help if you post what you have done so far. Good luck. – Shawn Mehan Apr 26 '17 at 21:46
  • Try this http://stackoverflow.com/questions/3925614/how-do-you-read-a-file-into-a-list-in-python and this http://stackoverflow.com/questions/1388818/how-can-i-compare-two-lists-in-python-and-return-matches – brennan Apr 26 '17 at 21:52

2 Answers2

1

Here is one possible approach:

Edit: taking the comment into consideration

I would first read your text file #1 into a Python Dictionary

d = dict()
with open("file1.txt") as f:
    for line in f:
       (val, key) = line.split()
       d[key] = int(val)

print d

Out: {'kiwi': 4, 1: 'banana', 3: 'pear', 4: 'kiwi', 5: 'apple', 'pear': 3, 'banana': 1, 'apple': 5}

Now, we read file 2 as a Python list

with open("file2.txt") as f:
    l = f.read().splitlines()

print l
Out: ['apple', 'orange', 'strawberry', 'banana']

Now, create another dictionary with the desired output:

d2 = {key:val for key,val in d.iteritems() if key in l}
print d2
Out: {'apple': 5, 'banana': 1}

I will leave it up to you to figure out how to write a dictionary into a text file. I would use pandas to convert it into a dataFrame and write the dataFrame as csv or a tsv. This is a workaround, there has to be a more direct way to do it.

Rakesh Adhikesavan
  • 11,966
  • 18
  • 51
  • 76
  • 1
    Right idea, terrible execution. `d` should have the fruit as keys, not as values. Also, `l2` is undefined. – alexis Apr 26 '17 at 22:05
0

I apologize for not providing information on my attempts earlier (I wasn't trying to ask for codes for free, just got stuck and needed some guidance).

Essentially I had a txt document with 700,000 words in paragraph form and I wanted to count the words and cross-reference it to another document which was in list form. I got this far

fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read().upper()
new_fh2 = inp.replace('.','').replace(',','').replace('?','')
new_fh3 = new_fh2.replace('-','').replace('_','').replace(';','')
new_fh4 = new_fh3.replace(':','').replace('!','').replace('(','')
new_fh5 = new_fh4.replace(')','').replace('/','')
new_fh6 = new_fh5.replace('|','').replace('&','').replace('[','')
new_fh7 = new_fh6.replace(']','').replace('%','').replace('+','')
new_fh8 = new_fh7.replace('*','').replace('@','').replace('=','')
new_fh9 = new_fh8.replace('>','').replace('<','')
new_fh10 = new_fh9.replace('{','').replace('}','').replace('~','')
new_fh11 = new_fh10.replace('"','').split()
new_fh12 = sorted(set(new_fh11)) 
for word in new_fh12:
    print new_fh11.count(word), word`

At this point I was prepared to use LibreOffice Base to do my comparison using 2 tables, but even with the count function that reduced by word count from 700k to 34k, entering in data crashed the program whenever I tried to upload. So I had to try and think of a code which would allow me to compare the two txt files in python, which handles this volume of data nicely. And I really had NO idea where to even begin, although I did know of a few merge functions I just didn't know how to define the merge. I ended up doing this instead

new_fh12 = new_fh11.split()
new_fh12.sort()
for x in sorted(new_fh12):
    print x

then I took this list and put it into excel in one column, added my second list to another column, then used the countif function to count and compare the two lists.

Kay
  • 11