-1

I've been looking for a way to do it for over a day but could not manage finding what I exactly need. I have 2 files. FIRST file has NAMES in one column POSITIONS in second and LETTERS in the third column. In the SECOND file I have NAMES in first column, and STRINGS in the second column. I have a loop that goes through each line in the FIRST file, matches the NAMES to the SECOND file on each line, and goes to the POSITIONS and changes the STRING using the LETTERS. Loop works perfectly, but I can not keep the changes for the next LETTER.

It's like

FIRST FILE

NAME1   2   X
NAME1   5   Z
NAME1   7   J
NAME2   3   P
NAME2   6   D

SECOND FILE

NAME1   AAAAAAAAA
NAME2   BBBBBBB

I use STRING as input and create a NEWSTRING during the loop with the changed LETTER and when I print it inside the loop, I get after the first loop:

AXAAAAAAA

And after the second:

AAAAZAAAA

What I am looking for is kind of a magical one liner that lets doing it inside the loop,something like STRING=NEWSTRING so that my input in the next loop will be NEWSTRING

AXAAAAAAA

and so it will generate

AXAAZAAAA

in the second loop

I have tried append, add, list, and a few more things, but none worked.

with open ("FILE1.txt")as f:  
        POS=f.readlines()  
        for line in POS:  
        columns=line.split()  
        query=columns[0]  
        locate=(int(columns[1])-1)  
        newnuc=columns[2]  
        oldnuc=columns[3]  
        with open ("FILE2.txt")as f:  
            Sequo=f.readlines()  
            for linex in Sequo:  
                columnos=linex.split()  
                querios=columnos[0]  
                sequence=columnos[1]  
                if query == querios:  
                    newseqons= sequence [:locate] + newnuc + sequence [locate + 1:]  
                    print(newseqons)  

HERE IS MY NEW CODE, PATRICK

 with open (r'C:\Users\Administrator\Desktop\Sequorro.txt') as f2:
     Sequo=f2.readlines()
     for linex in Sequo:
         columnos=linex.split()
         querios=columnos[0]
         sequence=columnos[1]
         d={}
         d.update({querios: sequence})
         print(d)
{'CRUP_004407-RA': 'AAAAAAAAA'}
{'CRUP_004416-RA': 'GGGGGGGGG'}


with open (r'C:\Users\Administrator\Desktop\POS.txt') as f1:
    POS=f1.readlines()
    for line in POS:
        columns=line.split()
        query=columns[0]
        locate=(int(columns[1]))
        newnuc=columns[2]
        oldnuc=columns[3]
        oldstr=d[querios]
        d[querios]=oldstr[:locate-1] +newnuc +oldstr[locate:]
        print(d)

{'CRUP_004416-RA': 'GCGGGGGGG'}
{'CRUP_004416-RA': 'GCGGGGGGG'}
{'CRUP_004416-RA': 'GCGGGTGGG'}
{'CRUP_004416-RA': 'GCCGGTGGG'}
{'CRUP_004416-RA': 'GCCAGTGGG'}
{'CRUP_004416-RA': 'GCCAGTTGG'}

with open (r'C:\Users\Administrator\Desktop\Sequorooo.txt','w') as f2:
    for querios, sequence in sorted(d.items()):
        f2.write('{}{}'.format(querios, sequence))
        f2.close()

CRUP_004416-RAGCCAGTTGG
FatihSarigol
  • 647
  • 7
  • 14
  • Well are you trying to write back to the file because if you are, you need a temp file or other file technique. I don't know the exact reason why you are having your issue since your indentation is wrong, but my guess is you're are reading from the file2 every time for the string, and since you didn't save the latest changes, you are getting the results. – MooingRawr Nov 01 '16 at 13:41
  • When you do `d={querios: sequence}` you are overwriting `d` and losing the value already in it. Instead do something like `d.update({querio: sequence})` – Patrick Haugh Nov 02 '16 at 13:21
  • @Patrick Haugh I edited and made d={} then d.update({querios: sequence}) but that didnt make a difference in the rest that I still have the weird error that it does replace correct letters in correct places, but it also does it to the previous letter, too (and when 2 changes overlap, keeps the first one). For unpacking, your update does work for both, but the second part gives an error unsupported operand type(s) for -: 'str' and 'int' for this line d[name]=oldstr[:pos-1] + rep + oldstr[pos:], but I think my code also works as long as I can update one last thing maybe for this error. Cheers – FatihSarigol Nov 02 '16 at 14:46

1 Answers1

1
with open('file2') as f2:
    d = {name: string_ for line in f2 for name, string_ in (line.split(),)}
    #Build a dictionary of names mapped to strings from the 2nd file

with open('file1') as f1:
    #Do the replacements on the dictionary for the rules in file1
    for line in f1:
        name, pos, rep, *_ = line.split()
        oldstr = d[name]
        d[name] = oldstr[:pos-1] + rep + oldstr[pos:]

with open('file2', 'w') as f2:
    for name, string_ in sorted(d.items()): 
        #Write the new strings and names back to the file
        f2.write('{} {}'.format(name, string_))
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • Much appreciated man, just a silly newbie question about your code, is "name: string_ " a special way of calling a string, or do you mean my column that has strings to be changed? I have no doubt that this code should do the trick, but struggling to implemnt to my code. Cheers – FatihSarigol Nov 01 '16 at 20:28
  • @DragonRider python lets you override the names of builtin functions and classes, like string. I don't want to do that, because then I can't use `string` in my code. `string_` is a name that looks like `string`, but the interpreter treats them entirely seperately – Patrick Haugh Nov 01 '16 at 22:35
  • @DragonRider if you're asking about the colon, that is a dictionary comprehension, explained in detail here: http://stackoverflow.com/questions/14507591/python-dictionary-comprehension – Patrick Haugh Nov 01 '16 at 22:37
  • Thank u for your time, man! When I do it your way, the first part gives a value error: too many values to unpack (expected 2){Python 3.5.2} (and the same error in the second part, where expected changes to 3). So, I implement it to my code though and the dictionary gets built nicely (so in my weird terminology, I say d={querios: sequence}, then I implement the second part nicely, too, and I do the replacements it works, but in my example file I should change GGGGGGGGG into GGCAGGTGG, but it changes into GCCAGTTGG, and third step works fine only except the tab disappears. – FatihSarigol Nov 02 '16 at 13:07
  • For the unpacking in the dictionary comprehension, change `line.split()` to `(line.split(),)`. As for the second problem, does your file 1 have any lines that are not 3 values separated by white space? – Patrick Haugh Nov 02 '16 at 13:13
  • I have added my new code and the outputs to my question. (And by the way I also lose my first line from the second file, but I think I can fix that somehow.) Thank you so much! – FatihSarigol Nov 02 '16 at 13:14
  • Lines in my file are like this: CRUP_004407-RA 2 C A (fourth is the old string but I dont use that column) and each looks to be split by tab – FatihSarigol Nov 02 '16 at 13:20
  • That value still needs to go somewhere when we unpack it. Fortunately, we can do `name, pos, rep, *_ = line.split()` and absorb the rest of that line into the variable `_` – Patrick Haugh Nov 02 '16 at 13:24