1


How can you create an if else statement in python when you have a file with both text and numbers.
Let's say I want to replace the values from the third to last column in the file below. I want to create an if else statement to replace values <5 or if there's a dot "." with a zero, and if possible to use that value as integer for a sum.
A quick and dirty solution using awk would look like this, but I'm curious on how to handle this type of data with python:

 awk -F"[ :]" '{if ( (!/^#/) && ($9<5 || $9==".") ) $9="0" ; print }'

So how do you solve this problem?
Thanks

Input file:

\##Comment1
\#Header
sample1 1   2   3   4   1:0:2:1:.:3
sample2 1   4   3   5   1:3:2:.:3:3
sample3 2   4   6   7   .:0:6:5:4:0



Desired output:

\##Comment1
\#Header
sample1 1   2   3   4   1:0:2:0:0:3
sample2 1   4   3   5   1:3:2:0:3:3
sample3 2   4   6   7   .:0:6:5:4:0
SUM = 5


Result so far

['sample1', '1', '2', '3', '4', '1', '0', '2', '0', '0', '3\n']
['sample2', '1', '4', '3', '5', '1', '3', '2', '0', '3', '3\n']
['sample3', '2', '4', '6', '7', '.', '0', '6', '5', '4', '0']


Here's what I have tried so far:

import re

data=open("inputfile.txt", 'r')
for line in data:
    if not line.startswith("#"):
        nodots = line.replace(":.",":0") 
        final_nodots=re.split('\t|:',nodots)
        if (int(final_nodots[8]))<5:
            final_nodots[8]="0"
            print (final_nodots)
        else:
            print(final_nodots)
  • How sum is 5 in your example? – mad_ Sep 25 '18 at 19:59
  • 5
    Thanks for your posting your effort and your input and desire output, but your question isn't very clear, how did you transform your input the your desire output? Why did `1` get turn to a `0` and the `.` in the second sample turn to `0` but the first sample didn't? – MooingRawr Sep 25 '18 at 20:00
  • For what I can understand you are trying to split by either `\t` or `:` and would eventually end up with index out of error which you haven't posted here.https://stackoverflow.com/questions/1059559/split-strings-with-multiple-delimiters – mad_ Sep 25 '18 at 20:18
  • Thanks for the comments. The sum should be the result from the third to last column. It should be the result from 0+0+5=5, after replacing any value <=5 (or a dot) to 0. And I'm sorry for the confusion, I know made a mess trying to explain my question. So let me rephrase it. So looking at the third to last field from the "input file" ("1",".","5"), I want to convert them to 0 if the value is less than 5 or if its a dot. AND use these numbers for any arithmetic operation afterwards. – Lucky Badger Sep 25 '18 at 20:35

1 Answers1

0
data=open("inputfile.txt", 'r')

import re
sums = 0
for line in data:
    if not line.startswith("#"):
        nodots = line.replace(".","0") 
        final_nodots=list(re.findall('\d:.+\d+',nodots)[0])
        if (int(final_nodots[6]))<5:
            final_nodots[6]="0"
        print(final_nodots)
        sums += int(final_nodots[6])
print(sums)

You were pretty close but you your final_nodots returns a split on : instead of a split on the first few numbers, so your 8 should have been a 3. After that just add a sums counter to keep track of that slot.

['sample1 1   2   3   4   1', '0', '2', '0', '0', '3\n']

There are better ways to achieve what you want but I just wanted to fix your code.

MooingRawr
  • 4,901
  • 3
  • 24
  • 31
  • I greatly appreciate it, and thanks for fixing it. Looking at the new output we lost the sample names and additional columns. So how do we preserve that information too? – Lucky Badger Sep 25 '18 at 21:42
  • @LuckyBadger sorry before I logged I didn't noticed I pasted the wrong code. Updated with your code and regex with a minor index change. – MooingRawr Sep 26 '18 at 13:27