how to enhance my dataset output using python

Question

I have 2 input files, input.txt and datainput.txt. I check if the 2nd column of input.txt matches the 1st column of datainput.txt, and if they match, then I put it's orthodb_id at the end relevant row in the output file.

input.txt:

5 21 218
6 11 1931
7 26 173

datainput.txt:

>21|95|28|5
Computer
>11|28|5|5
Cate

code.py:

import csv

with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                 output.writerow(row + [file1_data[key]])

This is the output I get with my code:

>21|95|28|5|5
>11|28|5|5|6

You would be better off if you use BioPython for reading fasta format input (datainput file). or see [fastareader](http://stackoverflow.com/questions/7654971/parsing-a-fasta-file-using-a-generator-python) very naive example! — aar cee, Mar 30 '13 at 13:23

score 1 · Accepted Answer · answered Mar 31 '13 at 20:21

You need just add an else block in your code to get the desired output:

import csv

with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                output.writerow(row + [file1_data[key]])
        else:
            outputfile.write(line)

Once you are satisfied with this solution, please submit it to http://codereview.stackexchange.com, to get some useful suggestions on how to improve it. — tshepang, Mar 31 '13 at 20:31

how to enhance my dataset output using python

1 Answers1