I know there's been several answers to questions regarding multiple delimiters, but my issue involves needing to delimit by multiple delimiters but not all of them. I have a file that contains the following:
((((((Anopheles_coluzzii:0.002798,Anopheles_arabiensis:0.005701):0.001405,(Anopheles_gambiae:0.002824,Anopheles_quadriannulatus:0.004249):0.002085):0,Anopheles_melas:0.008552):0.003211,Anopheles_merus:0.011152):0.068265,Anopheles_christyi:0.086784):0.023746,Anopheles_epiroticus:0.082921):1.101881;
It is newick format so all information is in one long line. What I would like to do is isolate all the numbers that follow another number. So for example the first number I would like to isolate is 0.001405. I would like to put that in a list with all the other numbers that follow a number (not a name etc).
I tried to use the following code:
with open("file.nh", "r") as f:
for line in f:
data = line
z = re.findall(r"[\w']+", data)
The issue here is that this splits the list using "." as well as the other delimiters and this is a problem because all the numbers I require have decimal points.
I considered going along with this and converting the numbers in the list to ints and then removing all non-int values and 0 values. However, some of the files contain 0 as a value that needs to be kept.
So is there a way of choosing which delimiters to use and which to avoid when multiple delimiters are required?