I have a large matrix in a gzip that looks something like this:
locus_1 mark1 0.0,0.0,0.0,0.0,0.0,0.4536,0.8177,0.4929,0.0,0.0
locus_2 mark2 0.0,0.0,0.0,0.0,0.0,0.5536,0.9177,0.2929,0.0,0.0
locus_3 mark2 0.0,0.0,0.1,0.0,0.0,0.9536,0.8177,0.2827,0.0,0.0
So, each row starts with two descriptors, followed by 10 values.
I simply want to parse out the first 5 values of this row, such that I have a matrix like this:
locus_1 mark1 0.0,0.0,0.0,0.0,0.0
locus_2 mark2 0.0,0.0,0.0,0.0,0.0
locus_3 mark2 0.0,0.0,0.1,0.0,0.0
I have made the following python script to parse this, but to no avail:
import gzip
import numpy as np
inFile = gzip.open('/home/anish/data.gz')
inFile.next()
for line in inFile:
cols = line.strip().replace('nan','0').split('\t')
data = cols[2:]
data = map(float,data)
gfpVals = data[:5]
print '\t'.join(cols[:6]) + '\t' + '\t'.join(map(str,gfpVals))
I simply get the error:
data = map(float,data)
ValueError: could not convert string to float: