1

So, I am very new to python and I am not sure if my code is the most effective, but would still be very appreciative if someone could explain to me why my script is returning the "name not defined" error when I run it. I have a list of 300 gene names in a separate file, one name per line, that I want to read, and store each line as a string variable.

Within the script I have a list of 600 variables. 300 variables labeled name_bitscore and 300 labeled name_length for each of the 300 names. I want to filter through the list based on a condition. My script looks like this:

#!/usr/bin/python
with open("seqnames-test1-iso-legal-temp.txt") as f:
    for line in f:
        exec("b="+line+"_bitscore")
        exec("l="+line+"_length")
        if 0.5*b <= 2*1.05*l and 0.5*b >= 2*0.95*l:
            print line
ham_pb_length=2973
ham_pb_bitscore=2165
g2225_ph_length=3303
cg2225_ph_bitscore=2278

etc. for the length and bitscore variables.

Essentially, what I am trying to do here, is read line 1 of the file "seqnames-test1-iso-legal-temp.txt" which is ham_pb. Then I use wanted to use the exec function to create a variable b=ham_pb_bitscore and l=ham_pb_length, so that I could test if half the value of the gene's bitscore is within the range of double its length with a 5% margin of error. Then, repeat this for every gene, i.e. every line of the file "seqnames-test1-sio-legal-temp.txt".

When I execute the script, I get the error message:

Traceback (most recent call last):
  File "duplicatebittest.py", line 4, in <module>
    exec("b="+line+"_bitscore")
  File "<string>", line 1, in <module>
NameError: name 'ham_pb' is not defined

I made another short script to make sure I was using the exec function correctly that looks like this:

#!/usr/pin/python
name="string"
string_value=4
exec("b="+name+"_value")
print(name)
print(b)

And this returns:

string
4

So, I know that I can use exec to include a string variable in a variable declaration because b returns 4 as expected. So, I am not sure why I get an error in my first script.

I tested to make sure the variable line was a string by entering

#!/usr/bin/python
    with open("seqnames-test1-iso-legal-temp.txt") as f:
        for line in f:
            print type(line)

And it returned the line

<type 'str'>

300 times, so I know each variable line is a string, which is why I don't understand why my test script worked, but this one did not.

Any help would be super appreciated!

asky
  • 1,520
  • 12
  • 20
Louis
  • 53
  • 6

2 Answers2

2

line is yield by the text file iterator, which issues a newline for each line read.

So your expression:

exec("b="+line+"_bitscore")

is passed to exec as:

b=ham_pb
_bitscore

Strip the output and that will work

exec("b="+line.rstrip()+"_bitscore")

provided that you move the following lines before the loop so variables are declared:

ham_pb_length=2973
ham_pb_bitscore=2165
g2225_ph_length=3303
cg2225_ph_bitscore=2278

Better: quit using exec and use dictionaries to avoid defining variables dynamically.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • I now just get the error `NameError: name 'ham_pb_bitscore' is not defined`. Could that be because it is defined after the block of code? If not, I'll try to rewrite it using a dictionary because I have read elsewhere that using exec this way isn't the best practice. – Louis Aug 10 '17 at 21:11
  • yes, move the block above. but yes, using exec isn't the best practice. And it's rather unsafe (if you don't control what's in the file, that could lead to code injection) – Jean-François Fabre Aug 10 '17 at 21:16
0

put #!/usr/bin/env python as the first line. See this question for more explanation.

As Jean pointed out, exec is not the right tool for this job. You should be using dictionaries, as they are less dangerous (search code injection) and dictionaries are easier to read. Here's an example of how to use dictionaries taken from the python documentation:

>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> list(tel.keys())
['irv', 'guido', 'jack']
>>> sorted(tel.keys())
['guido', 'irv', 'jack']
>>> 'guido' in tel
True
>>> 'jack' not in tel
False

Here's a way I can think of to accomplish your goal:

with open("seqnames-test1-iso-legal-temp.txt") as f:
    gene_data = {'ham_pb_length':2973, 'am_pb_bitscore':2165,
                 'g2225_ph_length':3303, 'cg2225_ph_bitscore':2278}
    '''maybe you have more of these gene data things. If so,
    just append them to the end of the above dictionary literal'''
    for line in f:
        if not line.isspace():
            bitscore = gene_data[line.rstrip()+'_bitscore']
            length = gene_data[line.rstrip()+'_bitscore']
            if (0.95*length <= bitscore/4 <= 1.05*length):
                print line

I take advantage of a few useful python features here. In python3, 5/7 evaluates to 0.7142857142857143, not your typical 0 as in many programming languages. If you want integer division in python3, use 5//7. Additionally, in python 1<2<3 evaluates to True, and 1<3<2 evaluates to False whereas in many programming languages, 1<2<3 evaluates to True<3 which might give an error or evaluate to True depending on the programming language.

asky
  • 1,520
  • 12
  • 20
  • This seems to work for the most part because it prints 4 expected results, but then gives me an error: `KeyError: '_bitscore'` – Louis Aug 11 '17 at 17:29
  • Your file probably has some whitespace at the end. The `KeyError` means that you tried a dictionary lookup and it failed. On the line `bitscore = gene_data[line.rstrip()+'_bitscore']`, `line.rstrip()` is the empty string on some iteration. Python then runs it as `bitscore = gene_data[''+'_bitscore']` which is the same as `bitscore = gene_data['_bitscore']`. To fix it, put `if not line.isspace():` after the for loop and before the assignment statement (and change indentation accordingly). This checks that the line is not all whitespace characters. I changed my original answer to demonstrate. – asky Aug 11 '17 at 18:37