-3

for the input

 ATTTGGC

 TGCCTTA

 CGGTATC

 GAAAATT

I want an output of 3-mers from each line forming a final list composed of all 3-mers the output should be like

[ATT, TTT, TTG, TGG, GGC, TGC, GCC...]

not the GC\n for first line or TA\n for second-line

def getKmersFromDna(Dna,k):
kmer_list = []
for i in range(len(Dna)-k+1):
        kmer_list.append(Dna[i:i+k])
return list(kmer_list)

giving

output like ['CC\n', 'C\nG', '\nGT'] which I do not want.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
KPYTHON
  • 1
  • 2
  • You're gonna have to explain how you got your desired output from the given input. All I see is a bunch of characters (which looks like DNA but I shouldn't have to go learn about DNA to answer your question). How do you get your final output from that? – Error - Syntactical Remorse Jul 12 '19 at 13:38
  • So you want to ``.splitlines`` and ``.strip`` away the newline? – MisterMiyagi Jul 12 '19 at 13:39
  • The newline is part of the issue but his outputs still do not match his desired @MisterMiyagi – Error - Syntactical Remorse Jul 12 '19 at 13:40
  • 1
    @Error-SyntacticalRemorse His (badly formatted) example code shows that he is picking up the rest as well. The only issue is that ``Dna`` is the raw file content, not split by lines and stripped. – MisterMiyagi Jul 12 '19 at 13:43
  • 1
    @KPYTHON How are you reading the file? The default in Python is to read files linewise - you would have to explicitly request the entire content to get the shown behaviour. – MisterMiyagi Jul 12 '19 at 13:46

2 Answers2

1
data = '''

 ATTTGGC

 TGCCTTA

 CGGTATC

 GAAAATT
 '''

for line in map(str.strip, data.splitlines()):
    if not line:
        continue
    print([''.join(c) for c in zip(line[::1], line[1::1], line[2::1])])

Prints:

['ATT', 'TTT', 'TTG', 'TGG', 'GGC']
['TGC', 'GCC', 'CCT', 'CTT', 'TTA']
['CGG', 'GGT', 'GTA', 'TAT', 'ATC']
['GAA', 'AAA', 'AAA', 'AAT', 'ATT']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    I am just learning. This code is pretty complex to me. It will help me to advance through my course. Thanks! – KPYTHON Jul 12 '19 at 14:07
0

One very basic code can be as follows:

def getKmersFromDna(Dna,k):
    dna_list = Dna.strip().split('\n')
    kmer_list = []
    for cur_dna in dna_list: # iternating over each line of input
        for i in range(len(cur_dna)-k+1): # finding Kmer from each line of input
            kmer_list.append(cur_dna[i:i+k])
    return list(kmer_list)
  • Thank you, Sir. This helps a lot. I am just learning that's why facing these basic problems – KPYTHON Jul 12 '19 at 14:04
  • @KPYTHON lease accept and upvote my answer, if it helped you. [Link how to accept an answer](https://stackoverflow.com/help/someone-answers) – Durgesh Kumar Jul 13 '19 at 05:07