0

I have 2 input files, input.txt and datainput.txt. I check if the 2nd column of input.txt matches the 1st column of datainput.txt, and if they match, then I put it's orthodb_id at the end relevant row in the output file.

input.txt:

5 21 218
6 11 1931
7 26 173

datainput.txt:

>21|95|28|5
Computer
>11|28|5|5
Cate 

code.py:

import csv

with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                 output.writerow(row + [file1_data[key]])

This is the output I get with my code:

>21|95|28|5|5
>11|28|5|5|6
Rocket
  • 553
  • 8
  • 31
  • You would be better off if you use BioPython for reading fasta format input (datainput file). or see [fastareader](http://stackoverflow.com/questions/7654971/parsing-a-fasta-file-using-a-generator-python) very naive example! – aar cee Mar 30 '13 at 13:23

1 Answers1

1

You need just add an else block in your code to get the desired output:

import csv

with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                output.writerow(row + [file1_data[key]])
        else:
            outputfile.write(line)
tshepang
  • 12,111
  • 21
  • 91
  • 136
  • 2
    Once you are satisfied with this solution, please submit it to http://codereview.stackexchange.com, to get some useful suggestions on how to improve it. – tshepang Mar 31 '13 at 20:31