1

This is my problem, I have 2 files, where I want to read and print lines.append():

File1:

ID1 desc1
ID2 desc2
ID3 desc3
ID4 desc4

File 2:

ID1 random1
ID5 random5
ID6 random6

What I would like to get is:

ID1 random1  desc1
ID5 random5  desc5
ID6 randomI  nothing

However, my current code:

address = {}

with open('address.txt', 'r') as f:
    rows = (line.rstrip().split('\t') for line in f)
    address = { row[0]:row[1:] for row in rows }

    for key, value in address.items():

        with open('families.txt', 'r') as f:    
            for line in f.readlines():
                line = line.rstrip('\n')
                line = line.split('\t')
                if line[0] == key: 
                    line.append(str(address[key]))
                    print ('\t'.join(line))
                else:
                    line.append('nothing')
                    print ('\t'.join(line))

However, I am getting a loop instead

ID1 random1  desc1
ID5 random5  nothing
ID6 randomI  nothing
ID1 random1  nothing
ID5 random5  desc5
ID6 random6  nothing

Also, it would be nice if someone can suggest the best way to discard the square brackets that are printed as part of the 'value' of my dictionary at the end.

E. Ducateme
  • 4,028
  • 2
  • 20
  • 30
gusa10
  • 169
  • 1
  • 1
  • 10
  • 2
    Is File1 'address.txt', and File2 'families.txt'? Your question doesn't make that clear. And why are you doing `for key, value in address.items():`? Just loop over the contents of 'families.txt' and use the ID of each line to test if it's in the `address` dictionary. – PM 2Ring Aug 06 '17 at 10:36

3 Answers3

1

Try it like this:

with open('address.txt') as fh1:
    data1 = {j[0]: j[1] for j in [i.strip().split('\t') for i in fh1.readlines()]}

with open('families.txt') as fh2:
    data2 = {j[0]: j[1] for j in [i.strip().split('\t') for i in fh2.readlines()]}

result = {k: [v, data1[k]] if k in data1 else [v, 'nothing'] for k, v in data2.items()}
zipa
  • 27,316
  • 6
  • 40
  • 58
  • Just a note: It looks to me that this is looping only over the items in data2 (families). If there are items in data1(address) which are not in data2 (families), they will not be included. From the example the OP gives, it looks like this is OK; only the OP knows if that is always true. – Basya Aug 06 '17 at 10:53
  • He said what he'd like to get and this does the job :) – zipa Aug 06 '17 at 11:24
  • As I said, it fits the example. I just thought it should be clear what it does not do, as an example is usually just that... – Basya Aug 06 '17 at 11:40
0

I think you would be better off reading each file into a dictionary. Don't reread the second file in the "for" loop.

Then, create a third dictionary.

Iterate over dict1:

for key, value in dict1.iteritems(): #python 2.7

or

for key, value in dict1.items(): #python 3

You can then create a third dictionary which would use the same keys, but the value would be a tuple. For each key in the iteration, if the key exists in dict1 then the first part of the tuple is the value. If it doesn't exist, the first part of the tuple is "nothing". Then do the same with the second dict, for the second value.

Then iterate over dict2, and do the same; just check each key; if it is in the new dict already, don't process it -- it already was processed -- and just continue.

 if key in new_dict:
     continue

Once you have this new dict, you can format it any way you want. This post gives a lot of formatting options.

Basya
  • 1,477
  • 1
  • 12
  • 22
0

I removed several items that were unnecessary and hopefully cleaned up a few things...

I removed references to '\n' and '\t' because the .rstrip() and .split() methods automatically handle those characters by default.

with open('address.txt', 'r') as f:
    rows = [line.rstrip().split() for line in f]

I took advantage of element unpacking in for statements to unpack the first and second items on each row into x and y values for insertion into your dictionary.

    address = { x: y for x, y in rows }


with open('families.txt', 'r') as f:
    for line in f.readlines():
        line = line.rstrip().split()

In this case, there was no need to loop over the lines in the families file AND the items in the address dictionary. Dictionaries are optimized for looking up keys, so we simply loop over the families file and do lookups in the dictionary, as we go.

        if line[0] in address:
            line.append(str(address[line[0]]))
            print('\t'.join(line))
        else:
            line.append('nothing')
            print('\t'.join(line))
E. Ducateme
  • 4,028
  • 2
  • 20
  • 30