I am reading a FASTA file that has a format like this:
>gi|31563518|ref|NP_852610.1| microtubule-associated proteins 1A/1B light chain 3A isoform b [Homo sapiens] MKMRFFSSPCGKAAVDPADRCKEVQQIRDQHPSKIPVIIERYKGEKQLPVLDKTKFLVPDHVNMSELVKIIRRRLQLNPTQAFFLLVNQHSMVSVSTPIADIYEQEKDEDGFLYMVYASQETFGF
I have to read the file and then calculate the JC distance (For a pair of sequences, the JC distance is -3/4 * ln(1 - 4/3 * p), where p is the proportion of sites that differ between the pair)
I have set up the skeleton of it but am unsure how to do the rest. AFter reading and calculating the JukesCantor distance I have to write it to a new output file and it should be in a table any help i can get is much appreciated! thanks, new to python AND fasta files
def readData():
filename = input("Enter the name of the FASTA file: ")
infile = open(filename, "r")
def CalculateJC(x,y):
if x == y:
return 0
else:
return 1 # temporary*
def calcDists(seqs):
output = []
for seq1 in seqs:
newrow = []
for seq2 in seqs:
dist = calculateJS(seq1,seq2)
newrow.append(dist)
output.append(newrow)
list(enumerate(seasons))
return output
def outputDists(distMat):
pass
def main():
seqs = readData()
distMat = calcDists(seqs)
outputDists(distMat)
if__name__ == "__main__":
main()